Mid Page Query Refinements
๐ Search engines try to provide the most promising results in response to queries, but they can limit what they can return based on the queries used.
โ ๏ธ Some search queries can be too ambiguous, too general, or too specific to provide good results.
๐ Examples:
-
๐ Homonyms โ Words that have the same sound and possibly the same spelling but different meanings.
โก๏ธ Example: The word โbearโ ๐ป can mean to carry or refer to an animal or even an absence of clothing. -
๐ Improper Contexts โ Words that have multiple meanings depending on the subject.
โก๏ธ Example: The word โjaguarโ ๐ can refer to an animal, a Macintosh operating system ๐ป, or a luxury car ๐. -
๐ Very General Terms โ Provide overly broad search results ๐.
โก๏ธ Example: "food" ๐would return results ranging from recipes, restaurants, food products, cooking tips, and more. Itโs so broad that it doesnโt narrow down the type of food or context youโre looking for. -
๐ Very Narrow Terms โ Provide unduly restrictive and non-responsive search results ๐ฏ.
Query: "Toyota Corolla 2020 red model with leather seats in New York"
Results: This would be very specific, and you might find only a few listings (if any) for that exact configuration. Itโs so narrow that it might not return helpful information, like reviews or general price ranges, for other models.
๐ก This query refinement patent application is an attempt to provide suggestions to address these problems and better match the searcher's intent. โ
An example of how these Query Refinements work:
๐ A searcher may try to find information on Google by entering the word โjaguarโ into the search box and hitting enter.
๐ The results (webpages containing information) might fit into several meaningful groups:
- ๐ Jaguar cars (Vehicles from the Jaguar Corporation)
- ๐ Jaguar company websites (Official Jaguar websites in different countries)
- ๐ง Jaguar car owners association (A group for Jaguar car owners)
- ๐ป Macintosh OS Jaguar (An old Apple operating system version called "Jaguar")
- ๐ Jaguar animals (Information about the big cat species)
- ๐ Other random topics (Topics that donโt fit into any major group)
๐ How Google Understands and Refines Searches
๐๏ธ Step 1: Finding Initial Results
Google first finds many search results related to "jaguar." Then, it picks the top 100 most relevant results (this number can change).
๐ Step 2: Grouping the Results
Google organizes these 100 results into clusters (groups of similar documents).
Example:
- One group for Jaguar cars ๐
- One for Mac OS X Jaguar ๐ป
- One for Jaguar the animal ๐
- And so on...
Each document in these groups is matched with stored information from past searches to find related search terms.
๐ What Does "Stored Information from Past Searches" Mean?
Google remembers past searches ๐ and stores information about how people search and what they click on. This helps Google improve future searches!
Letโs say millions of people have searched for "Jaguar" before. Google has already collected data on:
- โ What search results they clicked on
- โ What other words they added later (e.g., "Jaguar car" or "Jaguar animal")
- โ What results were most useful
๐ก Example:
- A user searches "jaguar" ๐ง
- They click on Jaguar car websites ๐
- This tells Google: "Jaguar" is often related to cars!
Now, when you search "jaguar," Google uses this stored knowledge to help suggest better searches!
๐ How Does Google Match Documents with Past Search Data?
When Google finds 100 top results for "jaguar," it compares them to information from past searches (association database).
-
Step 1: Google finds which documents (webpages) are most relevant ๐
Is it about cars? ๐
Is it about the animal? ๐
Is it about the Mac OS? ๐ป -
Step 2: Google checks past user searches to see how similar documents were searched before.
If many past users searched "Jaguar car" and clicked car-related results ๐, then Google knows that a Jaguar car cluster exists.
If other users searched "Jaguar animal" and clicked animal-related results ๐, then Google knows a Jaguar animal cluster exists. -
Step 3: Google finds common keywords in those groups ๐ท๏ธ
In the car cluster, the words "car," "automobile," and "vehicle" show up a lot. ๐
In the animal cluster, the words "wild," "cat," and "species" appear frequently. ๐
๐ก Real-Life Example:
Imagine Netflix ๐ฟ. If you watch a lot of action movies, Netflix learns from your past choices and suggests more action movies ๐ฌ.
Similarly, Google learns from past searches and suggests better search terms based on what others have looked for before!
๐ Step 3: Finding Important Words
For each group, Google finds important words (called term vectors ๐) that appear most often in the documents.
Example for Jaguar cars ๐:
jaguar
automobile
car
USA
UK
Example for Mac OS X Jaguar ๐ป:
jaguar
X
Mac
OS
๐ฏ Step 4: Suggesting Better Search Queries
Now, Google suggests better searches (called query refinements ๐ ๏ธ) to help users find what they really want.
Examples:
-
โ
Instead of just "jaguar", Google may suggest:
"jaguar car" ๐ (if you're looking for the car)
"mac os x jaguar" ๐ป (if you're looking for the Mac system)
"jaguar racing" ๐๏ธ (if you're looking for Jaguar race cars)
"jaguar cat" ๐ (if you're looking for the animal)
๐ Step 5: Ranking the Suggestions
Google ranks these suggested queries based on:
- โ How relevant they are to the search results ๐
- โ How many documents match that suggestion ๐
So, a popular topic like "jaguar car" might appear higher in the suggestions than "jaguar cat" ๐.
โ Bonus: Excluding Irrelevant Results
Google can also suggest excluding certain meanings using a minus sign (-).
Example: If you only want results about the animal Jaguar ๐, you can search:
๐ "jaguar -car -mac-os-x -racing"
๐ฎ Step 6: Learning from Past Searches
Google also remembers past searches and prepares suggestions in advance based on what people have searched before.
Example: If many people searched for "Jaguar car" after searching "Jaguar," Google learns that most people looking for "Jaguar" actually mean the car. ๐
๐ What is a Cluster Centroid?
A cluster centroid is like the "center" or "average" of a group (cluster) of documents that are all related to the same topic. Itโs the ideal or most typical point that represents all the documents in that group.
๐ Imagine you have a group of people, and you want to find the average height of everyone in the group. You would measure the height of each person and find the "middle" height that represents the whole group. That middle height is the centroid of the group.
In Googleโs case, a cluster of documents (like a group about "Jaguar cars" ๐) will have a centroidโthe most common, central ideas of all the documents in that group.
๐ง How Does the Cluster Centroid Work in Googleโs Search Process?
When Google groups documents, it uses important words (term vectors ๐) to define the center of each cluster. Then, it calculates the centroid to find the most relevant and central words for that cluster.
Letโs break it down using an example:
๐ Example: Cluster Centroid for "Jaguar Cars"
Imagine Google has a cluster of documents about Jaguar cars ๐. These documents might include words like:
- Jaguar
- Car
- Automobile
- Engine
- Luxury
- USA
The cluster centroid would focus on the central or most important words in this group. It might be something like this:
Jaguar
Car
Automobile
Engine
What Are Associations? ๐ค
Stored Query (๐):
This is a search term or phrase that has been saved in the system.
Example: "Best pizza recipes".
Stored Document (๐):
This is a document, webpage, or piece of content that is saved and might be relevant to a stored query.
Example: An article titled "10 Amazing Pizza Recipes".
Association (๐):
An association is the link between the stored query and the stored document. It indicates that the document is relevant to the query.
Example: The system recognizes that the article "10 Amazing Pizza Recipes" is related to the query "Best pizza recipes".
๐ What the Precomputation System Does
-
Builds a Database ๐๏ธ
Stores queries, documents, associations, and weights in an association database ๐ฆ -
Creates Associations ๐
Matches stored queries with stored documents
Assigns a weight (importance score) to each pair -
Uses Cached Data & Logs ๐
References query logs, cached queries, and cached documents to improve accuracy -
Four Key Modules ๐๏ธ
Associator ๐ค โ Connects queries to documents & assigns weights
Selector ๐ฏ โ Picks documents based on search results
Regenerator ๐ โ Reuses past queries to refine searches
Inverter ๐ โ Swaps stored document-query pairings for efficiency
What It Does:
-
๐ Assigns a Weight:
For every pairing (or association) between a stored query and a stored document, the associator assigns a weight.
๐ Example: If you have a stored query "Best pizza recipes" and a stored document (an article about pizza recipes), the associator might assign a weight, say, 10. -
๐ Estimates Relevance:
The weight helps determine how relevant the stored document is to the query. A higher weight means stronger relevance.
โญ Example: If the article includes recipes, reviews, and tips, it might receive a higher weight than a simple list of ingredients. -
โ Sums Multiple Associations:
If multiple associations exist for the same query-document pair, their weights are added together.
๐ Example: If "Best pizza recipes" is linked to the same document through user clicks and manual tagging, with weights 10 and 5, the total becomes 15. -
๐ Considers Query Frequency (If Available):
If query frequency data exists (like a cached search count), the weight is multiplied by this frequency.
๐ข Example: If "Best pizza recipes" was searched 20 times, a weight of 15 might be multiplied by 20, emphasizing its popularity and relevance.
Selector: Choosing Stored Documents Based on Issued Search ๐
What It Does: โ
Selects Stored Documents: ๐
When a search is issued, the selector picks one or more stored documents for the associated stored query.
๐ Example: After you search for "Easy vegan desserts," the selector might pick a few stored documents like blog posts, recipes, or articles that match this query.
Uses Two Methods: ๐ ๏ธ
-
1๏ธโฃ Search Document Chosen Post-Search: ๐โก๏ธ๐
The document you click on after the search can determine which documents to associate.
๐ซ Example: If you choose an article on vegan brownies, that article gets selected as relevant. -
2๏ธโฃ Set of Search Results: ๐
It can also consider the entire set of search results received.
๐ฐ Example: If you get a list of five different dessert recipes, the selector might choose the ones with the highest relevance based on their associations with the query.
3. Regenerator: Utilizing the Query Log ๐
๐ What It Does
-
๐ Selects Based on Query Log:
The regenerator analyzes the query log to find stored documents associated with past searches.
๐ Example: If many users previously searched for "gluten-free bread," the regenerator may retrieve documents that were relevant to that query from the log. -
๐ Regenerates Search Results:
It can recreate search results using data from previously tracked queries.
๐ฅ Example: If a user searches for "summer salads," even if itโs not a new query, the regenerator can pull up past relevant results stored in the log, ensuring the best matches.
4. Inverter: Working with Cached Data ๐๐พ
๐ What It Does
-
๐ Selects Based on Cached Data:
The inverter examines cached documents and cached queries to create new associations.
๐ Example: If your browser has cached results for "DIY home decor," the inverter can use these cached documents to link them to the query. -
๐ Inverts Cached Pairings:
It flips existing cached documentโcached query pairings into queryโdocument pairings.
๐ Example: If a cached pairing originally says โDocument A is linked to Query X,โ the inverter reverses it to say โFor Query X, here is Document A.โ -
๐ Selects Inverted Cached Documents:
After inversion, the inverter selects cached documents to associate them with the corresponding cached query.
๐ผ๏ธ Example: If "DIY home decor" was cached along with multiple documents, the inverter will use this data to display those documents when you search for similar topics.
Step 1: Pre-Processing (Before You Even Search) ๐โ๏ธ
Before you even type a query (search term), the system does some prep work using a precomputation engine (a system that prepares data in advance). This engine has four key parts:
-
Associator ๐ค โ Links stored queries (previous searches) to stored documents (web pages). It assigns weights (importance scores) based on factors like how often a query is searched.
Example: If many people search for โbest pizza in New York,โ the system marks this query as highly relevant to pages about pizza in New York. -
Selector ๐ฏ โ Chooses which stored documents and queries should be used when someone searches for something.
Example: If you search for โtop smartphones,โ the system may decide to pull results from previously stored searches like โbest phones 2024.โ -
Regenerator ๐ โ Looks at past search logs (records of searches) to decide which documents (web pages) should be retrieved.
Example: If many users refine their search from โcheap laptopsโ to โbest budget laptops,โ the system learns that the second query is a useful alternative. -
Inverter ๐๐ฆ โ Uses cached data (previously saved information) to find related documents and queries.
Example: If an old but still relevant webpage about โclassic novelsโ is in the cache, it may still show up when you search for โbest old books.โ
Step 2: Query Refinement (Improving Search Suggestions) ๐ ๏ธ
Once a user enters a search query, the query refinement system works to improve it. This system also has four parts:
-
Matcher ๐ท๏ธ โ Matches stored documents (previously saved search data) with the new search query. It assigns a relevance score (a number that shows how useful the match is).
Example: If you search for โhealthy meal ideas,โ the system checks past searches and finds related stored queries like โquick healthy recipesโ and assigns them scores based on how relevant they are. -
Clusterer ๐งฉ โ Groups similar search queries into clusters (groups with common words or themes). These groups are ranked based on relevance.
Example: If you search โrunning shoes,โ the system might form clusters like:- ๐ For beginners โ โbest running shoes for beginnersโ
- ๐ For performance โ โfastest running shoes for marathonsโ
- ๐ฐ For budget โ โbest cheap running shoesโ
Step 2: Query Refinement (Improving Search Suggestions) ๐ ๏ธ
3. The Scorer analyzes each cluster and calculates a centroid (the "center" of a cluster, representing the most important words).
๐น Search queries are then scored based on:
- How many documents are linked to them
- How close they are to the centroid (most relevant topic)
โ Example: If the โbest affordable hiking bootsโ cluster has a lot of web pages and closely matches other user searches, it gets a higher score than a less relevant query.
Step 4: Presenter ๐ค
A presenter takes the highest scoring search queries and shows them to you as suggestions for refining your original search. The system keeps track of how the refinements were created but only shows the final suggestions.
Example: If you search โcoffee brewing,โ the system might suggest:
- โbest coffee brewing methodsโ
- โhow to make espresso at homeโ
- โcoffee brewing tips for beginnersโ
2017 Updates to Patent:
- The Scorer Method has been changed.
- Representative Queries are chosen based on centroids.
- For every cluster, a representative query is chosen.
- According to the cluster size, and relevance scores, the clusters are aligned.
- And, sub-queries are used as the refinement queries.