Every cache is a promise: "this data is probably still correct."
Stack enough "probably" on top of each other and you get Netflix Tudum's problem — editors save an article, wait minutes, preview still shows the old version.
Netflix's editorial platform had a CQRS architecture. Kafka bridged the write and read paths. Cassandra served as the read store. A near-cache sat in front.
When an editor saves, the update flows through six steps: CMS webhook, ingestion pipeline, Kafka, Cassandra, and finally the near-cache timer. Each step adds latency. Each step is another chance for the data to lie.
The near-cache refreshed on a schedule, not on demand. A page with 50 fragments, each with its own timer. Some update quickly. Some don't.
The result? Half-new, half-stale content. Not wrong. Not right. Just lying.
Here's what was happening under the hood. At zero seconds, the editor updates the article title. One second later, the CMS fires a webhook. Five seconds in, the ingestion pipeline processes. Ten seconds, Kafka receives the event. Fifteen seconds, Cassandra writes to disk. Finally at thirty seconds, the near-cache timer expires and refreshes.
Thirty seconds of lies. And that's the optimistic case.
Now multiply that by 50 page fragments, each with different refresh intervals. The header refreshes every 10 seconds—it shows the new title. The content body refreshes every 60 seconds—it shows the old text. The related links refresh every 120 seconds—still stale. The author bio refreshes every 30 seconds—shows the new avatar.
Frankenstein page. Some pieces new, some two minutes stale. Nobody can predict which version the user sees.
They didn't add a smarter cache. They didn't tune TTLs. They removed the caches entirely.
Before: CMS fires a webhook, goes through a pipeline, hits Kafka, lands in Cassandra, gets cached, then finally serves the client. Five places where data can go stale.
After: CMS writes to RAW Hollow (an in-memory object store embedded in every service instance), which serves the client directly. Single source of truth. No invalidation needed. No staleness possible.
What's RAW Hollow? It's an in-memory data structure that holds three years of content compressed into about 130MB per instance. Every lookup is O(1), in-process. No network hop. No cache invalidation. No stale data. Just memory reads.
The results were dramatic. Home page construction dropped from 1.4 seconds to 0.4 seconds. Cache staleness went from anywhere between 0-120 seconds down to literally zero. The number of components involved dropped from six-plus to just one. Debugging complexity went from high to basically non-existent.
The trade-off? About 80MB more RAM per instance. That's it. 80MB more RAM for zero staleness bugs.
Ask yourself: Does your dataset fit in memory? Netflix Tudum's editorial content is about 130MB. Most editorial content is small.
Is cache invalidation causing bugs? If the answer is yes, maybe the cache is the bug.
Are you stacking caches? CDN in front of Redis in front of app cache in front of database cache. Each layer lies independently.
Is "eventual consistency" causing user confusion? Users expect to see what they just saved.
Is debugging cache issues taking more time than the caching saves?
The solution isn't always "remove all caches." But when your caching architecture has more components than your business logic, something's wrong.
Simple (usually fine): App to Redis to database. One cache layer with clear invalidation.
Dangerous (audit carefully): CDN to edge cache to app cache to Redis to database cache. Five layers, five chances to serve stale data. Invalidation becomes a distributed systems problem.
Netflix Tudum's solution: App to in-memory data. Zero cache layers. Zero staleness.
The fastest cache is no cache. If your dataset fits in memory, skip the ceremony.
Every cache is a trade-off: speed vs accuracy. Know which one you're choosing. And if you're choosing "speed" but getting "bugs," you've chosen wrong.
The best cache strategy is the one where you stop needing a cache at all.
— blanho
The hidden state in your servers is why you can't just 'add more boxes'.
High throughput doesn't mean low latency. Often it means the opposite.
You don't have Netflix's problems. You have 3 developers and a Postgres database.