Team scaled from 2 app servers to 10. Traffic handled fine at the load balancer. CPU usage on app servers dropped to 20%. Everything looked great.
Except response times got worse.
With 2 servers, the database averaged 50ms per response. After scaling to 10, it jumped to 200ms. Same database, same queries — just five times more connections fighting for the same rows and locks.
Turns out 10 servers hammering one database is worse than 2 servers hammering one database. We didn't scale. We just moved the bottleneck.
Here's the order things actually break:
Single server — app and database on one box. Works fine until maybe 100–500 concurrent users, then disk I/O on the database side starts falling behind.
Separate database server — you move the database to its own machine. Buys you breathing room up to about 1,000 concurrent users. Then the connection count becomes the problem.
Multiple app servers behind a load balancer — this is where the trap is. You add app servers thinking you're scaling, but every new server opens more connections to the same database. The database becomes the ceiling, and you're pressing harder against it.
This is where most teams get stuck.
Vertical scaling means a bigger database server. More RAM, faster SSD, more cores. You go from db.r5.large at $200/month to db.r5.24xlarge at $6,400/month. It works until you hit AWS's biggest instance — or your budget.
Horizontal scaling means multiple database servers. This is where it gets complicated.
Read replicas are the easy first step:
80% of most apps are reads. Read replicas buy you a lot of headroom.
Write-heavy workloads still pound the primary. That's when you need sharding — splitting data across multiple database clusters by some key, like user ID ranges.
Sharding is painful. Cross-shard queries become expensive. Transactions across shards need distributed coordination. Think hard before you go here.
Before sharding, try these:
A single missing index can tank your whole application. I've seen it happen more than once.
Connection pooling is another free win. With PgBouncer sitting between your app servers and the database, 500 application connections can share 20 actual database connections. The database stops drowning in connection overhead.
Caching is the third lever. If you're hitting the database 1,000 times per second and 90% of those are the same hot keys, put Redis in front of it. Now you're down to 100 QPS on the database.
I've seen teams jump to sharding when a single index would have fixed the problem. Work in order. Only add complexity when simpler solutions fail.
Your database is almost always the bottleneck. Treat it accordingly.
Before you add servers, check your queries.
— blanho
ACID sounds simple until you learn what READ COMMITTED actually allows.
Netflix ripped out Kafka, Cassandra, and three cache layers. Because every cache is a lie.
Most devs treat payments like CRUD. Then money disappears.
# Split traffic: writes to primary, reads to replicas
def get_user(id):
return read_replica.query(
"SELECT * FROM users WHERE id = %s", id
)
def update_user(id, data):
return primary.query(
"UPDATE users SET ... WHERE id = %s", id, data
)def get_shard(user_id):
return f"shard_{user_id // 1_000_000}"-- Query optimization: the free 10x win
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;
-- "Seq Scan on orders" = missing index = disaster
CREATE INDEX idx_orders_user_id ON orders(user_id);