Why Every Complex System Eventually Becomes Event-Driven

Netflix's ad system started simple. Client calls server, server calls Microsoft's API, response comes back. Synchronous. Clean. Works. One endpoint. One dependency. Ship it.

Then came scale.

When Sync Breaks

At some point, every ad impression needs to notify five different systems. Billing needs to record the impression (50ms). Analytics needs to update the dashboard (100ms). Frequency capping needs to update the count (30ms). Third-party vendors need their tracking pixels fired (200ms—and that's optimistic for external calls). Fraud detection needs to check patterns (80ms).

With synchronous calls, you call each one in sequence. Call billing, wait. Call analytics, wait. Call frequency capping, wait. Call vendors, wait. Call fraud detection, wait. Finally respond.

Add up the wait times: 460 milliseconds. Your user is waiting almost half a second because of an analytics dashboard they'll never see. And if any one of those services is slow or down? Everything is blocked.

Synchronous calls become a web of dependencies where each new integration makes the whole system more fragile.

The Event-Driven Solution

Put a queue in the middle.

When an ad plays, the producer publishes a single event to Kafka and immediately returns—5 milliseconds tops. The user gets their response. Done.

Meanwhile, five independent consumers pick up that event at their own pace. Billing processes it. Analytics processes it. Fraud detection does its thing. If billing is slow, analytics doesn't care. If the third-party vendor's endpoint is timing out, your user already got their response seconds ago.

Each consumer processes events independently. Failures are isolated—billing being down doesn't break analytics. You can replay events if something got missed. New consumers can subscribe without touching the producer code at all.

Here's what that looks like in practice:

Same event, five consumers, zero coupling. User gets response in 5ms instead of 460ms.

One Source of Truth

Don't build separate pipelines for each use case. I've seen teams create a billing pipeline that extracts timestamp, user_id, and ad_id—then create an analytics pipeline that extracts the exact same fields. Then a vendor pipeline doing it again.

Instead, publish one standardized event. Document the schema. Version it. Anyone who needs that data can subscribe and filter what they need. Common operations like encryption or enrichment happen once, not five times.

The benefits compound: new consumers join without changing producers, clean separation of concerns, and you get an audit trail for free.

When to Adopt

There's a wrong time and a right time for events.

Too early: You have 2-3 services, maybe 100 requests per second. One team owns everything. Simple request/response works fine. No audit requirements. If something fails, acceptable downtime is… acceptable. Don't reach for Kafka. It's overkill.

Right time: You're at 5+ services. A thousand or more requests per second. Multiple teams need the same data. You need async processing. Audit trails matter. Failures in one system can't be allowed to cascade into others.

The Trade-offs

Event-driven isn't free. You get eventual consistency instead of immediate consistency. Kafka has operational burden—it's not a database you fire and forget. Debugging gets harder; you're tracing through queues instead of call stacks. Message ordering can bite you. There's a learning curve.

But you lose a lot of pain too. Tight coupling between services goes away. Cascading failures become isolated failures. Blocking calls disappear. Single points of failure are no longer single points. One slow consumer doesn't block the rest.

Start Simple

Netflix started with Microsoft's API. Direct synchronous call. They only built their own event-driven ad platform when they felt the pain and understood the problem.

That's the right path. Sync calls first. Background jobs when you need async (Redis queue, keep it simple). Event streaming when you need replay, audit trails, and serious scale.

The best architecture is the one that fits your current scale. Not the one you might need in three years.

Sync first. Event-driven when you feel the pain.

— blanho

Netflix's ad system started simple. Client calls server, server calls Microsoft's API, response comes back. Synchronous. Clean. Works. One endpoint. One dependency. Ship it.

Then came scale.

When Sync Breaks

With synchronous calls, you call each one in sequence. Call billing, wait. Call analytics, wait. Call frequency capping, wait. Call vendors, wait. Call fraud detection, wait. Finally respond.

Synchronous calls become a web of dependencies where each new integration makes the whole system more fragile.

The Event-Driven Solution

Put a queue in the middle.

When an ad plays, the producer publishes a single event to Kafka and immediately returns—5 milliseconds tops. The user gets their response. Done.

Here's what that looks like in practice:

Same event, five consumers, zero coupling. User gets response in 5ms instead of 460ms.

One Source of Truth

The benefits compound: new consumers join without changing producers, clean separation of concerns, and you get an audit trail for free.

When to Adopt

There's a wrong time and a right time for events.

The Trade-offs

Start Simple

Netflix started with Microsoft's API. Direct synchronous call. They only built their own event-driven ad platform when they felt the pain and understood the problem.

That's the right path. Sync calls first. Background jobs when you need async (Redis queue, keep it simple). Event streaming when you need replay, audit trails, and serious scale.

The best architecture is the one that fits your current scale. Not the one you might need in three years.

Sync first. Event-driven when you feel the pain.

— blanho

Why Every Complex System Eventually Becomes Event-Driven

When Sync Breaks

The Event-Driven Solution

One Source of Truth

When to Adopt

The Trade-offs

Start Simple

Related Posts

Your Payment System Is a Ticking Bomb

Indirection: The Most Underrated Pattern in System Design

When Millions of Users Need Matching: The Reverse Index

Why Every Complex System Eventually Becomes Event-Driven

When Sync Breaks

The Event-Driven Solution

One Source of Truth

When to Adopt

The Trade-offs

Start Simple

Related Posts

Your Payment System Is a Ticking Bomb

Your Payment System Is a Ticking Bomb

Indirection: The Most Underrated Pattern in System Design

Indirection: The Most Underrated Pattern in System Design

When Millions of Users Need Matching: The Reverse Index

When Millions of Users Need Matching: The Reverse Index

Your Payment System Is a Ticking Bomb

Indirection: The Most Underrated Pattern in System Design

When Millions of Users Need Matching: The Reverse Index