Your Database Is the Bottleneck

Team scaled from 2 app servers to 10. Traffic handled fine at the load balancer. CPU usage on app servers dropped to 20%. Everything looked great.

Except response times got worse.

With 2 servers, the database averaged 50ms per response. After scaling to 10, it jumped to 200ms. Same database, same queries — just five times more connections fighting for the same rows and locks.

Turns out 10 servers hammering one database is worse than 2 servers hammering one database. We didn't scale. We just moved the bottleneck.

The Scaling Journey

Here's the order things actually break:

Single server — app and database on one box. Works fine until maybe 100–500 concurrent users, then disk I/O on the database side starts falling behind.

Separate database server — you move the database to its own machine. Buys you breathing room up to about 1,000 concurrent users. Then the connection count becomes the problem.

Multiple app servers behind a load balancer — this is where the trap is. You add app servers thinking you're scaling, but every new server opens more connections to the same database. The database becomes the ceiling, and you're pressing harder against it.

This is where most teams get stuck.

Vertical vs Horizontal (Database Edition)

Vertical scaling means a bigger database server. More RAM, faster SSD, more cores. You go from db.r5.large at $200/month to db.r5.24xlarge at $6,400/month. It works until you hit AWS's biggest instance — or your budget.

Horizontal scaling means multiple database servers. This is where it gets complicated.

Read replicas are the easy first step:

80% of most apps are reads. Read replicas buy you a lot of headroom.

When Read Replicas Aren't Enough

Write-heavy workloads still pound the primary. That's when you need sharding — splitting data across multiple database clusters by some key, like user ID ranges.

Sharding is painful. Cross-shard queries become expensive. Transactions across shards need distributed coordination. Think hard before you go here.

The Steps Most People Skip

Before sharding, try these:

A single missing index can tank your whole application. I've seen it happen more than once.

Connection pooling is another free win. With PgBouncer sitting between your app servers and the database, 500 application connections can share 20 actual database connections. The database stops drowning in connection overhead.

Caching is the third lever. If you're hitting the database 1,000 times per second and 90% of those are the same hot keys, put Redis in front of it. Now you're down to 100 QPS on the database.

The Order Matters

Optimize queries — Free, low complexity, 2–10x impact.
Add caching — $50/month, low complexity, 5–20x impact.
Connection pooling — Free, low complexity, 2–5x impact.
Add read replicas — $200/month, medium complexity, 2–5x impact.
Shard the database — expensive, high complexity, 10x+ impact.

I've seen teams jump to sharding when a single index would have fixed the problem. Work in order. Only add complexity when simpler solutions fail.

Your database is almost always the bottleneck. Treat it accordingly.

Before you add servers, check your queries.

— blanho

Team scaled from 2 app servers to 10. Traffic handled fine at the load balancer. CPU usage on app servers dropped to 20%. Everything looked great.

Except response times got worse.

Turns out 10 servers hammering one database is worse than 2 servers hammering one database. We didn't scale. We just moved the bottleneck.

The Scaling Journey

Here's the order things actually break:

Single server — app and database on one box. Works fine until maybe 100–500 concurrent users, then disk I/O on the database side starts falling behind.

Separate database server — you move the database to its own machine. Buys you breathing room up to about 1,000 concurrent users. Then the connection count becomes the problem.

This is where most teams get stuck.

Vertical vs Horizontal (Database Edition)

Horizontal scaling means multiple database servers. This is where it gets complicated.

Read replicas are the easy first step:

80% of most apps are reads. Read replicas buy you a lot of headroom.

When Read Replicas Aren't Enough

Write-heavy workloads still pound the primary. That's when you need sharding — splitting data across multiple database clusters by some key, like user ID ranges.

Sharding is painful. Cross-shard queries become expensive. Transactions across shards need distributed coordination. Think hard before you go here.

The Steps Most People Skip

Before sharding, try these:

A single missing index can tank your whole application. I've seen it happen more than once.

Caching is the third lever. If you're hitting the database 1,000 times per second and 90% of those are the same hot keys, put Redis in front of it. Now you're down to 100 QPS on the database.

The Order Matters

Optimize queries — Free, low complexity, 2–10x impact.
Add caching — $50/month, low complexity, 5–20x impact.
Connection pooling — Free, low complexity, 2–5x impact.
Add read replicas — $200/month, medium complexity, 2–5x impact.
Shard the database — expensive, high complexity, 10x+ impact.

I've seen teams jump to sharding when a single index would have fixed the problem. Work in order. Only add complexity when simpler solutions fail.

Your database is almost always the bottleneck. Treat it accordingly.

Before you add servers, check your queries.

— blanho

Your Database Is the Bottleneck

The Scaling Journey

Vertical vs Horizontal (Database Edition)

When Read Replicas Aren't Enough

The Steps Most People Skip

The Order Matters

Related Posts

Database Transactions: Everything You Think You Know Is Probably Wrong

Your Cache Is Lying to You

Your Payment System Is a Ticking Bomb

Your Database Is the Bottleneck

The Scaling Journey

Vertical vs Horizontal (Database Edition)

When Read Replicas Aren't Enough

The Steps Most People Skip

The Order Matters

Related Posts

Database Transactions: Everything You Think You Know Is Probably Wrong

Your Cache Is Lying to You

Your Payment System Is a Ticking Bomb