Mid-page Query Refinement

Mid Page Query Refinements

🔎 Search engines try to provide the most promising results in response to queries, but they can limit what they can return based on the queries used.

⚠️ Some search queries can be too ambiguous, too general, or too specific to provide good results.

📝 Examples:

📌 Homonyms – Words that have the same sound and possibly the same spelling but different meanings.
➡️ Example: The word “bear” 🐻 can mean to carry or refer to an animal or even an absence of clothing.
📌 Improper Contexts – Words that have multiple meanings depending on the subject.
➡️ Example: The word “jaguar” 🐆 can refer to an animal, a Macintosh operating system 💻, or a luxury car 🚗.
📌 Very General Terms – Provide overly broad search results 🌍.
➡️ Example: "food" 🍕would return results ranging from recipes, restaurants, food products, cooking tips, and more. It’s so broad that it doesn’t narrow down the type of food or context you’re looking for.
📌 Very Narrow Terms – Provide unduly restrictive and non-responsive search results 🎯.
Query: "Toyota Corolla 2020 red model with leather seats in New York"
Results: This would be very specific, and you might find only a few listings (if any) for that exact configuration. It’s so narrow that it might not return helpful information, like reviews or general price ranges, for other models.

💡 This query refinement patent application is an attempt to provide suggestions to address these problems and better match the searcher's intent. ✅

An example of how these Query Refinements work:

🔎 A searcher may try to find information on Google by entering the word “jaguar” into the search box and hitting enter.

📄 The results (webpages containing information) might fit into several meaningful groups:

🚗 Jaguar cars (Vehicles from the Jaguar Corporation)
🌍 Jaguar company websites (Official Jaguar websites in different countries)
🔧 Jaguar car owners association (A group for Jaguar car owners)
💻 Macintosh OS Jaguar (An old Apple operating system version called "Jaguar")
🐆 Jaguar animals (Information about the big cat species)
📌 Other random topics (Topics that don’t fit into any major group)

🔍 How Google Understands and Refines Searches

🏗️ Step 1: Finding Initial Results

Google first finds many search results related to "jaguar." Then, it picks the top 100 most relevant results (this number can change).

🔄 Step 2: Grouping the Results

Google organizes these 100 results into clusters (groups of similar documents).

Example:

One group for Jaguar cars 🚗
One for Mac OS X Jaguar 💻
One for Jaguar the animal 🐆
And so on...

Each document in these groups is matched with stored information from past searches to find related search terms.

📂 What Does "Stored Information from Past Searches" Mean?

Google remembers past searches 📚 and stores information about how people search and what they click on. This helps Google improve future searches!

Let’s say millions of people have searched for "Jaguar" before. Google has already collected data on:

✔ What search results they clicked on
✔ What other words they added later (e.g., "Jaguar car" or "Jaguar animal")
✔ What results were most useful

💡 Example:

A user searches "jaguar" 🧐
They click on Jaguar car websites 🚗
This tells Google: "Jaguar" is often related to cars!

Now, when you search "jaguar," Google uses this stored knowledge to help suggest better searches!

🔄 How Does Google Match Documents with Past Search Data?

When Google finds 100 top results for "jaguar," it compares them to information from past searches (association database).

Step 1: Google finds which documents (webpages) are most relevant 📄
Is it about cars? 🚗
Is it about the animal? 🐆
Is it about the Mac OS? 💻
Step 2: Google checks past user searches to see how similar documents were searched before.
If many past users searched "Jaguar car" and clicked car-related results 🚗, then Google knows that a Jaguar car cluster exists.
If other users searched "Jaguar animal" and clicked animal-related results 🐆, then Google knows a Jaguar animal cluster exists.
Step 3: Google finds common keywords in those groups 🏷️
In the car cluster, the words "car," "automobile," and "vehicle" show up a lot. 🚗
In the animal cluster, the words "wild," "cat," and "species" appear frequently. 🐆

💡 Real-Life Example:
Imagine Netflix 🍿. If you watch a lot of action movies, Netflix learns from your past choices and suggests more action movies 🎬.
Similarly, Google learns from past searches and suggests better search terms based on what others have looked for before!

📊 Step 3: Finding Important Words

For each group, Google finds important words (called term vectors 📌) that appear most often in the documents.

Example for Jaguar cars 🚗:
jaguar
automobile
car
USA
UK

Example for Mac OS X Jaguar 💻:
jaguar
X
Mac
OS

🎯 Step 4: Suggesting Better Search Queries

Now, Google suggests better searches (called query refinements 🛠️) to help users find what they really want.

Examples:

✅ Instead of just "jaguar", Google may suggest:
"jaguar car" 🚗 (if you're looking for the car)
"mac os x jaguar" 💻 (if you're looking for the Mac system)
"jaguar racing" 🏎️ (if you're looking for Jaguar race cars)
"jaguar cat" 🐆 (if you're looking for the animal)

📊 Step 5: Ranking the Suggestions

Google ranks these suggested queries based on:

✔ How relevant they are to the search results 📈
✔ How many documents match that suggestion 📚

So, a popular topic like "jaguar car" might appear higher in the suggestions than "jaguar cat" 🏆.

➖ Bonus: Excluding Irrelevant Results
Google can also suggest excluding certain meanings using a minus sign (-).
Example: If you only want results about the animal Jaguar 🐆, you can search:
🔍 "jaguar -car -mac-os-x -racing"

🔮 Step 6: Learning from Past Searches

Google also remembers past searches and prepares suggestions in advance based on what people have searched before.

Example: If many people searched for "Jaguar car" after searching "Jaguar," Google learns that most people looking for "Jaguar" actually mean the car. 🚗

🌍 What is a Cluster Centroid?

A cluster centroid is like the "center" or "average" of a group (cluster) of documents that are all related to the same topic. It’s the ideal or most typical point that represents all the documents in that group.

🔍 Imagine you have a group of people, and you want to find the average height of everyone in the group. You would measure the height of each person and find the "middle" height that represents the whole group. That middle height is the centroid of the group.

In Google’s case, a cluster of documents (like a group about "Jaguar cars" 🚗) will have a centroid—the most common, central ideas of all the documents in that group.

🧠 How Does the Cluster Centroid Work in Google’s Search Process?

When Google groups documents, it uses important words (term vectors 📌) to define the center of each cluster. Then, it calculates the centroid to find the most relevant and central words for that cluster.

Let’s break it down using an example:

🚗 Example: Cluster Centroid for "Jaguar Cars"
Imagine Google has a cluster of documents about Jaguar cars 🚗. These documents might include words like:

Jaguar
Car
Automobile
Engine
Luxury
USA

The cluster centroid would focus on the central or most important words in this group. It might be something like this:
Jaguar
Car
Automobile
Engine

What Are Associations? 🤝

Stored Query (🔎):
This is a search term or phrase that has been saved in the system.
Example: "Best pizza recipes".

Stored Document (📄):
This is a document, webpage, or piece of content that is saved and might be relevant to a stored query.
Example: An article titled "10 Amazing Pizza Recipes".

Association (🔗):
An association is the link between the stored query and the stored document. It indicates that the document is relevant to the query.
Example: The system recognizes that the article "10 Amazing Pizza Recipes" is related to the query "Best pizza recipes".

🚀 What the Precomputation System Does

Builds a Database 🗂️
Stores queries, documents, associations, and weights in an association database 📦
Creates Associations 🔗
Matches stored queries with stored documents
Assigns a weight (importance score) to each pair
Uses Cached Data & Logs 📝
References query logs, cached queries, and cached documents to improve accuracy
Four Key Modules 🏗️
Associator 🤝 → Connects queries to documents & assigns weights
Selector 🎯 → Picks documents based on search results
Regenerator 🔄 → Reuses past queries to refine searches
Inverter 🔃 → Swaps stored document-query pairings for efficiency

What It Does:

📊 Assigns a Weight:
For every pairing (or association) between a stored query and a stored document, the associator assigns a weight.
🍕 Example: If you have a stored query "Best pizza recipes" and a stored document (an article about pizza recipes), the associator might assign a weight, say, 10.
📏 Estimates Relevance:
The weight helps determine how relevant the stored document is to the query. A higher weight means stronger relevance.
⭐ Example: If the article includes recipes, reviews, and tips, it might receive a higher weight than a simple list of ingredients.
➕ Sums Multiple Associations:
If multiple associations exist for the same query-document pair, their weights are added together.
🔄 Example: If "Best pizza recipes" is linked to the same document through user clicks and manual tagging, with weights 10 and 5, the total becomes 15.
📈 Considers Query Frequency (If Available):
If query frequency data exists (like a cached search count), the weight is multiplied by this frequency.
🔢 Example: If "Best pizza recipes" was searched 20 times, a weight of 15 might be multiplied by 20, emphasizing its popularity and relevance.

Selector: Choosing Stored Documents Based on Issued Search 🔍

What It Does: ✅

Selects Stored Documents: 📂
When a search is issued, the selector picks one or more stored documents for the associated stored query.
🔍 Example: After you search for "Easy vegan desserts," the selector might pick a few stored documents like blog posts, recipes, or articles that match this query.

Uses Two Methods: 🛠️

1️⃣ Search Document Chosen Post-Search: 📄➡️🔎
The document you click on after the search can determine which documents to associate.
🍫 Example: If you choose an article on vegan brownies, that article gets selected as relevant.
2️⃣ Set of Search Results: 📊
It can also consider the entire set of search results received.
🍰 Example: If you get a list of five different dessert recipes, the selector might choose the ones with the highest relevance based on their associations with the query.

3. Regenerator: Utilizing the Query Log 🔄

🔄 What It Does

📜 Selects Based on Query Log:
The regenerator analyzes the query log to find stored documents associated with past searches.
🔍 Example: If many users previously searched for "gluten-free bread," the regenerator may retrieve documents that were relevant to that query from the log.
🔁 Regenerates Search Results:
It can recreate search results using data from previously tracked queries.
🥗 Example: If a user searches for "summer salads," even if it’s not a new query, the regenerator can pull up past relevant results stored in the log, ensuring the best matches.

4. Inverter: Working with Cached Data 🔄💾

🔄 What It Does

🗂 Selects Based on Cached Data:
The inverter examines cached documents and cached queries to create new associations.
🏠 Example: If your browser has cached results for "DIY home decor," the inverter can use these cached documents to link them to the query.
🔄 Inverts Cached Pairings:
It flips existing cached document–cached query pairings into query–document pairings.
🔁 Example: If a cached pairing originally says “Document A is linked to Query X,” the inverter reverses it to say “For Query X, here is Document A.”
📌 Selects Inverted Cached Documents:
After inversion, the inverter selects cached documents to associate them with the corresponding cached query.
🖼️ Example: If "DIY home decor" was cached along with multiple documents, the inverter will use this data to display those documents when you search for similar topics.

Step 1: Pre-Processing (Before You Even Search) 🔍⚙️

Before you even type a query (search term), the system does some prep work using a precomputation engine (a system that prepares data in advance). This engine has four key parts:

Associator 🤝 – Links stored queries (previous searches) to stored documents (web pages). It assigns weights (importance scores) based on factors like how often a query is searched.
Example: If many people search for “best pizza in New York,” the system marks this query as highly relevant to pages about pizza in New York.
Selector 🎯 – Chooses which stored documents and queries should be used when someone searches for something.
Example: If you search for “top smartphones,” the system may decide to pull results from previously stored searches like “best phones 2024.”
Regenerator 🔄 – Looks at past search logs (records of searches) to decide which documents (web pages) should be retrieved.
Example: If many users refine their search from “cheap laptops” to “best budget laptops,” the system learns that the second query is a useful alternative.
Inverter 🔍📦 – Uses cached data (previously saved information) to find related documents and queries.
Example: If an old but still relevant webpage about “classic novels” is in the cache, it may still show up when you search for “best old books.”

Step 2: Query Refinement (Improving Search Suggestions) 🛠️

Once a user enters a search query, the query refinement system works to improve it. This system also has four parts:

Matcher 🏷️ – Matches stored documents (previously saved search data) with the new search query. It assigns a relevance score (a number that shows how useful the match is).
Example: If you search for “healthy meal ideas,” the system checks past searches and finds related stored queries like “quick healthy recipes” and assigns them scores based on how relevant they are.
Clusterer 🧩 – Groups similar search queries into clusters (groups with common words or themes). These groups are ranked based on relevance.
Example: If you search “running shoes,” the system might form clusters like:
- 🏃 For beginners – “best running shoes for beginners”
- 🏆 For performance – “fastest running shoes for marathons”
- 💰 For budget – “best cheap running shoes”
The highest-ranking clusters will be considered for suggestions.

Step 2: Query Refinement (Improving Search Suggestions) 🛠️

3. The Scorer analyzes each cluster and calculates a centroid (the "center" of a cluster, representing the most important words).
🔹 Search queries are then scored based on:

How many documents are linked to them
How close they are to the centroid (most relevant topic)

✅ Example: If the “best affordable hiking boots” cluster has a lot of web pages and closely matches other user searches, it gets a higher score than a less relevant query.

Step 4: Presenter 🎤

A presenter takes the highest scoring search queries and shows them to you as suggestions for refining your original search. The system keeps track of how the refinements were created but only shows the final suggestions.

Example: If you search “coffee brewing,” the system might suggest:

“best coffee brewing methods”
“how to make espresso at home”
“coffee brewing tips for beginners”

2017 Updates to Patent:

The Scorer Method has been changed.
Representative Queries are chosen based on centroids.
For every cluster, a representative query is chosen.
According to the cluster size, and relevance scores, the clusters are aligned.
And, sub-queries are used as the refinement queries.