When Millions of Users Need Matching: The Reverse Index

Here's the problem: you're building a job alert system.

10 million users have saved searches like "Java + Remote" or "Python + New York".

100,000 new jobs come in daily.

The naive approach: for each new job, check all 10 million saved searches to see if it matches. That's 100,000 jobs times 10 million searches—one trillion operations per day. About 11.5 million operations per second.

Not happening.

Flip It: The Percolator Pattern

Elasticsearch calls this the Percolator. The key insight is to flip the search.

Traditional search is what you already know: you have many documents (products, articles, jobs), a user runs a query, and you return matching documents.

The Percolator flips it: you have many queries (saved searches, alerts), a new document comes in, and you return matching queries.

It's search in reverse.

You index the queries, not just the documents. When a new document arrives, you percolate it against the query index. Elasticsearch returns: "This job matches User A, B, and C."

You go from O(n × m) to O(n × log m). The difference between "impossible" and "easy."

How It Works

When a user saves a job alert, you index it as a query:

Do this for all 10 million alerts—10 million indexed queries.

When a new job comes in, percolate it:

Elasticsearch tells you which queries match. Send notifications to those users. Done in milliseconds, not days.

Real Applications

Job sites like LinkedIn and Indeed: The document is a new job posting. The queries are 50 million saved job alerts. The action: "Notify users whose alerts match this job." Without percolator, you'd check 50M alerts per job—impossible. With percolator, you check one document against the query index—fast.

E-commerce like Amazon and eBay: The document is a product whose price just dropped to $450. The queries are "notify me when price under $500." The action: "These 1,247 users want this alert."

News aggregators: The document is a new article about cryptocurrency regulation. The queries are subscriber filters. The action: "Send to these 50,000 subscribers."

Monitoring tools like Datadog and PagerDuty: The document is a log entry with error_rate at 7%. The queries are "error rate above 5% and service equals payments." The action: "Page the on-call engineer."

The Scaling Catch

Percolator queries are expensive to index. Every saved search equals one indexed query.

The challenge: 10 million queries in one index gets slow.

Shard by category. Engineering jobs only check the Engineering alerts index. Marketing jobs only check Marketing. Reduces search space by 10x.

Shard by location. NYC jobs only check NYC alerts. SF jobs only check SF alerts. Reduces search space by 50x.

Limit queries per user. Free tier gets 5 saved searches, Pro tier gets 50. Controls index growth.

When to Use

Traditional search works when you have few queries against many documents. The user initiates the search, it's real-time and on-demand. Examples: Google search, e-commerce browsing, log exploration, dashboard queries.

Percolator (reverse search) works when you have many queries against few documents. The system initiates matching, processing in batch or streaming. Examples: job alerts, price drop notifications, real-time alerting rules, regulatory monitoring.

If you have more queries than incoming documents, flip the search.

When you have more queries than documents, flip the search.

— blanho

Here's the problem: you're building a job alert system.

10 million users have saved searches like "Java + Remote" or "Python + New York".

100,000 new jobs come in daily.

Not happening.

Flip It: The Percolator Pattern

Elasticsearch calls this the Percolator. The key insight is to flip the search.

Traditional search is what you already know: you have many documents (products, articles, jobs), a user runs a query, and you return matching documents.

The Percolator flips it: you have many queries (saved searches, alerts), a new document comes in, and you return matching queries.

It's search in reverse.

You index the queries, not just the documents. When a new document arrives, you percolate it against the query index. Elasticsearch returns: "This job matches User A, B, and C."

You go from O(n × m) to O(n × log m). The difference between "impossible" and "easy."

How It Works

When a user saves a job alert, you index it as a query:

Do this for all 10 million alerts—10 million indexed queries.

When a new job comes in, percolate it:

Elasticsearch tells you which queries match. Send notifications to those users. Done in milliseconds, not days.

Real Applications

E-commerce like Amazon and eBay: The document is a product whose price just dropped to $450. The queries are "notify me when price under $500." The action: "These 1,247 users want this alert."

News aggregators: The document is a new article about cryptocurrency regulation. The queries are subscriber filters. The action: "Send to these 50,000 subscribers."

The Scaling Catch

Percolator queries are expensive to index. Every saved search equals one indexed query.

The challenge: 10 million queries in one index gets slow.

Shard by category. Engineering jobs only check the Engineering alerts index. Marketing jobs only check Marketing. Reduces search space by 10x.

Shard by location. NYC jobs only check NYC alerts. SF jobs only check SF alerts. Reduces search space by 50x.

Limit queries per user. Free tier gets 5 saved searches, Pro tier gets 50. Controls index growth.

When to Use

If you have more queries than incoming documents, flip the search.

When you have more queries than documents, flip the search.

— blanho

When Millions of Users Need Matching: The Reverse Index

Flip It: The Percolator Pattern

How It Works

Real Applications

The Scaling Catch

When to Use

Related Posts

Stateful Services Will Break Your Scaling

Your Database Is the Bottleneck

Indirection: The Most Underrated Pattern in System Design

When Millions of Users Need Matching: The Reverse Index

Flip It: The Percolator Pattern

How It Works

Real Applications

The Scaling Catch

When to Use

Related Posts

Stateful Services Will Break Your Scaling

Stateful Services Will Break Your Scaling

Your Database Is the Bottleneck

Your Database Is the Bottleneck

Indirection: The Most Underrated Pattern in System Design

Indirection: The Most Underrated Pattern in System Design

Stateful Services Will Break Your Scaling

Your Database Is the Bottleneck

Indirection: The Most Underrated Pattern in System Design