Last year we tried to scale our API by adding more servers behind a load balancer. Traffic was fine at 1,000 concurrent users. At 5,000, everything broke.
The load balancer was routing requests randomly. But our sessions lived in server memory. User A logs in on server 1, next request goes to server 2, suddenly they're logged out. Shopping carts disappeared mid-checkout.
The fix wasn't more servers. It was removing state from our servers entirely.
A stateless service doesn't remember anything between requests. Every request contains everything needed to process it. You could kill the server mid-request, restart it, send the same request again, and get the same result.
A stateful service holds data between requests—sessions in memory, files on local disk, locks in local variables. That data ties users to specific servers.
Think of it like a coffee shop chain. A stateless shop: you walk in, order, pay, get coffee. Any location works the same. A stateful shop: they keep your loyalty card on-site. Now you're stuck going back to that one location.
Most apps track users across requests. You log in once, browse around, add items to cart. HTTP is stateless, so we use session cookies to fake continuity.
The question is: where does session data live?
In server memory: Fast, but now users are pinned to servers. Can't scale horizontally, can't restart servers without logging everyone out.
In cookies: Simple and truly stateless. Works great for small data—user ID, a token, preferences. But cookies go with every request. 5KB of session data means 5KB overhead on every single HTTP call. Mobile users feel this.
In external storage: Redis or a database holds sessions. Cookie just contains the session ID. Any server can look up the session. This is what most production systems do. The session ID in the cookie points to data in Redis. Any server can handle any request.
Sessions are obvious. Files are sneaky.
User uploads a profile photo. It lands on server 2's disk. Next request goes to server 1. Profile photo not found.
Object storage (S3, GCS, MinIO): Store files externally. All servers read/write to the same bucket. This is the standard answer.
CDN for reads: Serve static content from edge nodes. Faster for users, less load on your servers.
Generated files—PDFs, reports, exports—same problem. Generate them, push to object storage, return a signed URL. Don't store them locally.
Caches: Local in-memory caches seem harmless. User updates their profile on server 1, cache on server 2 still has old data. Solution: centralized cache (Redis again) or cache invalidation strategies.
Scheduled jobs: Cron jobs on specific servers create implicit state. That server becomes special. Move to distributed job queues—SQS, RabbitMQ, or database-backed queues.
WebSocket connections: Long-lived connections are inherently stateful. User connects to server 1, you can't just route their messages through server 2. Solutions: sticky sessions (compromise), or pub/sub layer where any server can publish to any connection.
Stateless isn't free. External session storage adds latency—network hop to Redis on every request. Object storage is slower than local disk. Distributed caches are more complex than Map<String, Object>.
But the alternative is being unable to scale. Being unable to deploy without kicking users off. Being unable to recover from server failures gracefully.
For most web applications, the tradeoff is worth it. Make your servers interchangeable clones. Keep state in purpose-built systems designed for it.
If your server can't die without consequences, you can't scale.
— blanho
Synchronous calls work until they don't. Then you need a message queue. Here's why.
High throughput doesn't mean low latency. Often it means the opposite.
You added servers but nothing got faster. Here's why.