Every architecture diagram makes migration look clean. Old box, arrow, new box. Done.
Then you actually do it and discover the old box has eight years of hidden behavior.
Reddit's backend: one Python service, four core models — Comments, Accounts, Posts, Subreddits — different teams stepping on each other.
They picked Comments first. Biggest dataset. Highest write throughput. Bold choice.
Route traffic to the new Go service. It generates a response, calls the old Python endpoint too, compares both, logs differences. Return Python's response to users.
How Tap Compare works: Request hits the Go service, which generates its response. Go also calls Python in shadow mode, compares the results, and logs any differences. But it always returns Python's response to the user — totally safe.
If Go has bugs, users never see them. Zero risk.
Comments write to three places simultaneously: Postgres, Memcached, Redis (CDC events). Can't use tap compare — comment IDs are unique.
The Write Problem: When a user creates comment "Hello", it writes to Postgres (primary), Memcached (cache), and Redis (CDC events). Go can't shadow this — IDs would conflict!
Reddit's solution: sister datastores. Complete mirrors that only Go writes to.
If Go has bugs, they only corrupt the isolated copies. Production stays safe.
Serialization mismatches. Python and Go serialize differently.
Hidden ORM magic. Python's ORM had optimizations nobody documented. Go hit the database 3x harder with the same logic.
Race conditions. User edits comment to "hello." Go writes it. Another user edits to "hello again." Comparison fails. Bug or timing? Hours to figure out.
P99 Latency:
Python: occasionally 15 seconds (!)
Go: consistently < 100ms
Migration time: 18 months
Lines of comparison code: 10,000+
The short-term pain paid off.
A migration isn't done when the new system works. It's done when you can turn off the old one.
— blanho
You don't have Netflix's problems. You have 3 developers and a Postgres database.
Grab rewrote Go to Rust. Latency? Same. CPU usage? 5x better. Know the difference.
Netflix ripped out Kafka, Cassandra, and three cache layers. Because every cache is a lie.
# Sister datastore pattern
def create_comment(content):
# 1. Traffic hits Go
# 2. Go calls Python for the REAL write
real_id = python_service.create_comment(content)
# 3. Go writes to ISOLATED sister stores
sister_postgres.insert(content)
sister_memcached.set(content)
sister_redis.publish(content)
# 4. Compare production vs sister data
compare_and_log(real_id, sister_id)
return real_id # Always return Python's result# Python: {"created_at": "2024-01-15T10:30:00Z"}
# Go: {"created_at": "2024-01-15T10:30:00+00:00"}
# Same timestamp. Different string. Comparison fails.