Why Your Fast System Feels Slow

We optimized our API to handle 10x more requests per second. Throughput graphs looked amazing. Then users started complaining the app felt sluggish.

How do you make a system faster and slower at the same time? By confusing throughput with latency.

The Difference That Matters

Latency is how long one request takes. User clicks, user waits, user sees result. Measured in milliseconds.

Throughput is how many requests per second. System capacity, not user experience. Measured in RPS.

The trap: they often move in opposite directions.

The Restaurant Analogy

Imagine a restaurant with one chef. Each dish takes 10 minutes. Latency: 10 minutes. Throughput: 6 dishes per hour.

Now add a queue system. Customers order ahead, the chef batches similar dishes, cooks five steaks at once instead of one at a time. Throughput jumps to 20 dishes per hour.

But your steak? It sat in queue for 25 minutes before cooking started. Latency went from 10 minutes to 35 minutes.

Higher throughput. Worse latency. Same kitchen.

Where This Shows Up in Systems

Batch processing with Kafka or SQS: Higher throughput, but each message waits for the batch to fill.

Connection pooling: Handle more total requests, but individual requests queue for an available connection.

Regional load balancing: Better global distribution, but some requests route to distant servers.

Write coalescing: Fewer disk operations, but each write waits for the buffer to flush.

Optimizing for the Wrong Thing

I've seen teams celebrate throughput improvements while users silently leave.

The dashboard looks great: average latency at 150ms, 10,000 requests per second, 0.1% error rate. Ship it.

But the p99 latency—the experience for the slowest 1% of users—is 8 seconds. Eight seconds. Why? Queuing under load. The median user is fine. The tail user waited in queue for 7.8 seconds.

Most users don't feel averages. They feel the worst-case. One bad experience and they're gone.

Different Systems, Different Priorities

Redis is optimized for latency. Simple operations, in-memory, single-threaded so there's no lock delays. Sub-millisecond responses.

Kafka is optimized for throughput. It batches messages, does sequential disk writes, and happily trades latency for volume. 100ms batch waits are normal and expected.

Payment systems are optimized for correctness. Neither throughput nor latency is the priority—not double-charging people is. Accept slowness for safety.

What Actually Matters: User Experience

Before optimizing, ask: what does the user feel?

For a checkout page, the user is waiting with credit card in hand. Every second costs conversions. Optimize for latency. Target p99 under 500ms.

For background job processing, the user uploaded a file and went to lunch. They're not watching a spinner. Optimize for throughput. Target 10K files per hour.

For an analytics dashboard, someone is staring at the screen waiting for numbers. Optimize for latency. Target p99 under 2 seconds.

For log ingestion, no user is waiting at all. Optimize for throughput. Target 1M events per second.

The answer is never "optimize both equally." You pick one, and you accept the trade-off on the other.

The Measurement Trap

Some metrics lie. "Average latency is 150ms" hides the p99 of 8 seconds. "We handle 10K RPS" doesn't say at what latency. "System is fast" begs the question—fast for whom?

Metrics that matter look like: "p50 = 150ms, p99 = 800ms, p99.9 = 2s." Or "10K RPS at p99 under 500ms." Or "Checkout latency p99 under SLA."

Always measure both. Report both. Optimize for the one that matters to your users.

Fast for the machine and fast for the user are different problems.

— blanho

We optimized our API to handle 10x more requests per second. Throughput graphs looked amazing. Then users started complaining the app felt sluggish.

How do you make a system faster and slower at the same time? By confusing throughput with latency.

The Difference That Matters

Latency is how long one request takes. User clicks, user waits, user sees result. Measured in milliseconds.

Throughput is how many requests per second. System capacity, not user experience. Measured in RPS.

The trap: they often move in opposite directions.

The Restaurant Analogy

Imagine a restaurant with one chef. Each dish takes 10 minutes. Latency: 10 minutes. Throughput: 6 dishes per hour.

Now add a queue system. Customers order ahead, the chef batches similar dishes, cooks five steaks at once instead of one at a time. Throughput jumps to 20 dishes per hour.

But your steak? It sat in queue for 25 minutes before cooking started. Latency went from 10 minutes to 35 minutes.

Higher throughput. Worse latency. Same kitchen.

Where This Shows Up in Systems

Batch processing with Kafka or SQS: Higher throughput, but each message waits for the batch to fill.

Connection pooling: Handle more total requests, but individual requests queue for an available connection.

Regional load balancing: Better global distribution, but some requests route to distant servers.

Write coalescing: Fewer disk operations, but each write waits for the buffer to flush.

Optimizing for the Wrong Thing

I've seen teams celebrate throughput improvements while users silently leave.

The dashboard looks great: average latency at 150ms, 10,000 requests per second, 0.1% error rate. Ship it.

But the p99 latency—the experience for the slowest 1% of users—is 8 seconds. Eight seconds. Why? Queuing under load. The median user is fine. The tail user waited in queue for 7.8 seconds.

Most users don't feel averages. They feel the worst-case. One bad experience and they're gone.

Different Systems, Different Priorities

Redis is optimized for latency. Simple operations, in-memory, single-threaded so there's no lock delays. Sub-millisecond responses.

Kafka is optimized for throughput. It batches messages, does sequential disk writes, and happily trades latency for volume. 100ms batch waits are normal and expected.

Payment systems are optimized for correctness. Neither throughput nor latency is the priority—not double-charging people is. Accept slowness for safety.

What Actually Matters: User Experience

Before optimizing, ask: what does the user feel?

For a checkout page, the user is waiting with credit card in hand. Every second costs conversions. Optimize for latency. Target p99 under 500ms.

For background job processing, the user uploaded a file and went to lunch. They're not watching a spinner. Optimize for throughput. Target 10K files per hour.

For an analytics dashboard, someone is staring at the screen waiting for numbers. Optimize for latency. Target p99 under 2 seconds.

For log ingestion, no user is waiting at all. Optimize for throughput. Target 1M events per second.

The answer is never "optimize both equally." You pick one, and you accept the trade-off on the other.

The Measurement Trap

Some metrics lie. "Average latency is 150ms" hides the p99 of 8 seconds. "We handle 10K RPS" doesn't say at what latency. "System is fast" begs the question—fast for whom?

Metrics that matter look like: "p50 = 150ms, p99 = 800ms, p99.9 = 2s." Or "10K RPS at p99 under 500ms." Or "Checkout latency p99 under SLA."

Always measure both. Report both. Optimize for the one that matters to your users.

Fast for the machine and fast for the user are different problems.

— blanho

Why Your Fast System Feels Slow

The Difference That Matters

The Restaurant Analogy

Where This Shows Up in Systems

Optimizing for the Wrong Thing

Different Systems, Different Priorities

What Actually Matters: User Experience

The Measurement Trap

Related Posts

Two Layers You Need Before Microservices Break You

Your Cache Is Lying to You

Why Every Complex System Eventually Becomes Event-Driven

Why Your Fast System Feels Slow

The Difference That Matters

The Restaurant Analogy

Where This Shows Up in Systems

Optimizing for the Wrong Thing

Different Systems, Different Priorities

What Actually Matters: User Experience

The Measurement Trap

Related Posts

Two Layers You Need Before Microservices Break You

Your Cache Is Lying to You

Why Every Complex System Eventually Becomes Event-Driven