Messaging & Event Streaming
Message queues decouple producers from consumers, absorb traffic spikes, and enable async workflows—but queue semantics differ radically from event streams. Choosing Kafka over RabbitMQ over SQS is not a popularity contest; it is a decision about throughput, ordering, replay, and operational burden. This chapter covers patterns, broker trade-offs, backpressure, ordering, and the delivery guarantees every senior engineer must articulate with precision.
Message queue vs event stream semantics
Both move bytes between services asynchronously, but they optimize for different mental models: queues delete work after consumption; streams retain an ordered log for replay.
A message queue (RabbitMQ, SQS, ActiveMQ) treats each message as a discrete unit of work. Once a consumer acknowledges processing, the message is removed from the queue. Multiple competing consumers pull from the same queue and each message goes to exactly one consumer—classic work distribution.
An event stream (Kafka, Kinesis, Pulsar) appends events to an immutable, partitioned log. Consumers read at their own offset; messages are not deleted on read (retention is time- or size-based). The same event can be replayed by new consumers, reprocessed after a bug fix, or fan-out to many independent consumer groups without duplicating writes.
flowchart LR
subgraph Queue["Message Queue"]
Q1[Msg A] --> C1[Consumer 1]
Q2[Msg B] --> C2[Consumer 2]
Q3[Msg C] --> C3[Consumer 3]
end
subgraph Stream["Event Stream (log)"]
L[Partition log\noffset 0..N]
L --> G1[Consumer Group A\noffset 42]
L --> G2[Consumer Group B\noffset 17]
L --> G3[New replay consumer\noffset 0]
end
| Dimension | Message queue | Event stream |
|---|---|---|
| Consumption model | Competing consumers; message deleted after ACK | Consumer groups with independent offsets; log retained |
| Replay | Not supported (message gone after ACK) | First-class: reset offset, new consumer group reads from beginning |
| Fan-out | Requires exchange/topic routing (RabbitMQ) or multiple queues (SQS) | Native: N consumer groups each read full stream independently |
| Ordering | Single queue ≈ FIFO; multiple consumers break global order | Per-partition strict order; global order requires single partition |
| Throughput ceiling | ~10K–50K msg/s per broker (RabbitMQ); SQS scales horizontally by design | ~1M+ msg/s per Kafka cluster (benchmark-dependent); partition parallelism |
| Typical use | Task dispatch, RPC decoupling, job queues, email sending | Event sourcing backbone, CDC, analytics pipelines, audit log |
Pub/sub vs point-to-point
Point-to-point: one producer, one logical queue, N competing consumers—each message processed once. SQS standard queues and RabbitMQ work queues follow this model.
Pub/sub: one producer publishes to a topic; every subscriber receives a copy. RabbitMQ fanout/topic exchanges, SNS→SQS fan-out, and Kafka consumer groups (each group gets all partitions) implement variants of pub/sub. The critical distinction: in Kafka, "subscriber" means a consumer group, not an individual consumer instance.
Kafka stores messages on disk in segment files (typically 1 GB each), not in broker RAM.
Sequential disk I/O + zero-copy transfer (sendfile) is why Kafka sustains millions of writes/sec
on modest hardware. RabbitMQ keeps messages in memory (with optional persistence to disk per message),
which caps throughput but reduces end-to-end latency for small messages (~1 ms vs Kafka's ~5–15 ms p99 produce).
Queues optimize for task completion—fire-and-forget jobs where replay is irrelevant. Streams optimize for data continuity—when downstream systems must catch up, reprocess history, or join multiple event sources. Using a queue as an audit log forces you to duplicate every message to storage; using a stream as a task queue wastes disk and complicates "delete after process" semantics.
When asked "queue or Kafka?", answer with the access pattern: "Do consumers need to replay history? Do multiple independent services need the same events? Is ordering per entity required?" Order confirmation emails → SQS queue. User activity analytics + fraud detection + search indexing → Kafka stream.
Queue patterns
Six patterns cover 90% of production messaging architectures. Know when each applies—and what breaks when you combine them naively.
Work queue (competing consumers)
Producers enqueue tasks; a pool of workers competes to pull the next available message. Horizontal scaling is trivial: add workers until queue depth stabilizes. SQS + Auto Scaling Group and Celery + RabbitMQ are canonical implementations.
flowchart LR P1[Producer] --> Q[Task Queue] P2[Producer] --> Q Q --> W1[Worker 1] Q --> W2[Worker 2] Q --> W3[Worker 3]
Scaling rule: if queue depth grows linearly with traffic, you are under-provisioned. Target p99 queue wait < 30 seconds for interactive-adjacent tasks; batch jobs can tolerate minutes. At 10K tasks/min and 500 ms average processing time, you need ~84 concurrent workers (10,000 × 0.5 / 60 ≈ 83).
Fan-out
One event triggers multiple downstream actions without the producer knowing subscribers. SNS → multiple SQS queues, RabbitMQ fanout exchange → bound queues, Kafka topic → multiple consumer groups.
SNS + SQS fan-out handles ~30K messages/sec per region with automatic scaling. Each subscriber queue processes independently—slow analytics does not block fast notification delivery.
Priority queue
VIP or time-sensitive messages jump ahead of bulk traffic. Implementations vary:
- Separate queues — high-priority queue with dedicated workers (simplest, most reliable)
- RabbitMQ priority —
x-max-priorityheader (0–255); head-of-line blocking risk - Weighted polling — workers poll high queue 3× before low queue once
- SQS — no native priority; use separate queues + weighted consumers
Delay queue
Defer processing until a future time—retry backoff, scheduled reminders, order cancellation windows. SQS supports native delay up to 15 minutes per message. For longer delays, store in DynamoDB/Redis with TTL + sweeper, or use Step Functions wait states. RabbitMQ requires TTL + dead-letter exchange trick or delayed-message plugin.
Dead letter queue (DLQ)
Messages that fail processing after N retries route to a DLQ for inspection, manual replay, or alerting. Without a DLQ, poison messages block the queue indefinitely or silently drop.
# SQS redrive policy — send to DLQ after 3 failures
{
"RedrivePolicy": {
"deadLetterTargetArn": "arn:aws:sqs:us-east-1:123456789:orders-dlq",
"maxReceiveCount": 3
}
}
# Consumer: always extend visibility timeout for long tasks
def process_message(msg):
try:
do_work(msg.body)
sqs.delete_message(QueueUrl=QUEUE, ReceiptHandle=msg.receipt_handle)
except TransientError:
# Message returns to queue after visibility timeout
raise
except PermanentError:
# Log and delete — or let maxReceiveCount send to DLQ
logger.error("poison message", extra={"id": msg.message_id})
sqs.delete_message(QueueUrl=QUEUE, ReceiptHandle=msg.receipt_handle)
DLQ without monitoring is a graveyard. Teams set up DLQs, never alert on depth, and lose revenue-critical events for weeks. Alert when DLQ depth > 0; run weekly replay tooling; include original message metadata and stack traces in DLQ payload.
Request-reply
Async RPC: client publishes request with a reply-to queue/correlation ID; worker responds on
the reply queue. RabbitMQ direct reply-to and temporary exclusive queues support this natively.
Kafka request-reply uses a dedicated reply topic + correlation ID in headers (higher latency, less common).
sequenceDiagram participant Client participant ReqQ as Request Queue participant Worker participant ReplyQ as Reply Queue Client->>ReqQ: publish(request, reply_to=ReplyQ, corr_id=abc) ReqQ->>Worker: deliver Worker->>Worker: process Worker->>ReplyQ: publish(response, corr_id=abc) ReplyQ->>Client: deliver matching corr_id
| Pattern | When to use | Watch out for |
|---|---|---|
| Work queue | Background jobs, image processing, email dispatch | Hot partitions if keyed poorly; visibility timeout too short |
| Fan-out | Order placed → inventory, email, analytics, search | Partial failure: one subscriber fails, others succeed—need idempotency |
| Priority | Payment retries before marketing emails | Starvation of low-priority if high never empties |
| Delay | Retry backoff, cart abandonment reminders | Clock skew; SQS 15-min max delay limit |
| DLQ | Any queue with non-idempotent or external side effects | Unmonitored DLQ; infinite retry burning CPU |
| Request-reply | Async RPC when sync HTTP timeout too short | Reply queue expiry; orphaned replies; prefer HTTP/gRPC for sync paths |
Stripe uses SQS extensively for webhook delivery and async payment processing— DLQs capture failed webhook attempts for manual replay. Instagram (circa 2012 architecture) used Celery + RabbitMQ work queues for photo processing pipelines. Amazon built SQS because every service needed reliable async decoupling at massive scale.
Default architecture for "something happened, do stuff": SNS fan-out → per-concern SQS queues → Auto Scaling workers + DLQ + CloudWatch alarm. Simple, battle-tested, no Kafka ops unless you need replay or stream processing joins.
Kafka vs RabbitMQ vs SQS
Three brokers dominate production conversations. Each excels in a different quadrant of throughput, latency, ops burden, and feature surface.
| Metric | Apache Kafka | RabbitMQ | Amazon SQS |
|---|---|---|---|
| Peak throughput | ~1–3M msg/s per cluster (3-node, 1 KB msgs, rf=3) | ~20–50K msg/s per broker (persistent, mirrored) | ~3K msg/s per queue (soft limit); unlimited queues |
| End-to-end latency (p99) | 5–15 ms produce; 10–50 ms consume batch | 1–5 ms (in-memory, small msgs) | ~10–100 ms (eventually consistent) |
| Message size limit | Default 1 MB (configurable to ~10 MB) | 128 MB (practical: <1 MB) | 256 KB hard limit |
| Ordering | Per-partition strict FIFO | Single consumer per queue ≈ FIFO | Standard: best-effort; FIFO queues: 300 msg/s per group |
| Replay | Yes — reset consumer offset | No — message deleted on ACK | No |
| Ops model | Self-managed (or MSK/Confluent Cloud); ZooKeeper/KRaft | Self-managed or CloudAMQP; Erlang cluster | Fully managed; zero broker ops |
| Cost at 1B msgs/month | ~$500–2K (MSK) + eng time | ~$200–800 (managed) + eng time | ~$400 (@ $0.40/million requests) |
Choose Kafka when
- Throughput exceeds 100K events/sec sustained
- Multiple consumer groups need the same event stream (analytics + search + fraud)
- Event replay, CDC, or stream processing (Kafka Streams, Flink) is required
- You can invest in partition design, retention tuning, and broker operations
Choose RabbitMQ when
- Complex routing (topic/direct/fanout exchanges, headers routing)
- Low-latency task dispatch (<5 ms) with moderate throughput (<50K/s)
- Request-reply, priority queues, or per-message TTL needed out of the box
- Team already runs Erlang-friendly infra or uses CloudAMQP
Choose SQS when
- AWS-native stack; zero desire to operate message brokers
- Work queue pattern with Auto Scaling; throughput per queue <3K/s (shard queues if higher)
- Decoupling microservices with DLQ + SNS fan-out; replay not required
- Message size <256 KB; latency tolerance 50–200 ms
flowchart TD
Start[Need async messaging?] --> Replay{Need replay / stream processing?}
Replay -->|Yes| Kafka[Kafka / Kinesis / Pulsar]
Replay -->|No| AWS{All-in on AWS?}
AWS -->|Yes, simple tasks| SQS[SQS + SNS fan-out]
AWS -->|No or complex routing| RMQ{Routing complexity?}
RMQ -->|High| RabbitMQ[RabbitMQ]
RMQ -->|Low, high throughput| Kafka
RMQ -->|Low, moderate| SQS or RabbitMQ
10M events/day ≈ 115 events/sec average, ~500/sec peak (4× factor). Single SQS queue handles this easily. At 100M/day (~1.2K/sec peak), still one SQS queue. At 1B/day (~12K/sec peak), shard into 4+ SQS queues or switch to Kafka. Kafka 3-broker cluster comfortably handles 500K/sec with proper partitioning.
L6 candidates cite operational cost, not just throughput: "Kafka gives replay and 1M/s, but we need 2 FTEs for cluster ops, partition rebalancing, and consumer lag triage. For 5K order events/sec with no replay, SNS→SQS saves $200K/year in eng time."
Memorize three numbers: Kafka ~1M msg/s cluster, RabbitMQ ~50K msg/s broker, SQS ~3K msg/s per queue. Pair with use case: "Notification fan-out on AWS → SNS+SQS. Clickstream analytics → Kafka. Complex routing microservices → RabbitMQ."
Backpressure solutions
When producers outpace consumers, something must give—unbounded queues mask failure until they OOM. Backpressure is the discipline of propagating slowness upstream instead of hiding it.
Symptoms: growing consumer lag (Kafka), queue depth monotonically increasing (SQS), RabbitMQ memory alarms, OOM kills on consumers, or cascading timeouts as downstream retries amplify load.
Detection metrics
| Broker | Primary metric | Alert threshold (starting point) |
|---|---|---|
| Kafka | Consumer lag (records behind log end) | Lag > 10K records or growing >5 min |
| SQS | ApproximateNumberOfMessagesVisible | Depth > 1000 or age of oldest > 5 min |
| RabbitMQ | Queue depth + memory/disk alarms | Depth > 10K or memory > 40% watermark |
Solutions (in order of preference)
- Scale consumers — add worker instances until lag stabilizes. Kafka: add consumers up to partition count (max parallelism = partitions). SQS: Auto Scaling on queue depth metric.
- Rate-limit producers — token bucket at API gateway when downstream saturated. Better to reject requests with 429 than accept and lose them.
- Increase partition/queue count — horizontal shard for Kafka partitions or multiple SQS queues.
- Drop or sample — for non-critical telemetry, sample 10% under pressure. Never drop payment or auth events without explicit product approval.
-
Bounded buffers + blocking — reactive streams, Go channels with capacity, Java
BlockingQueue. Producer blocks when buffer full—backpressure propagates synchronously. -
Circuit breaker on publish — stop producing when broker health degrades;
spill to local disk buffer (Kafka producer
buffer.memory) with timeout.
// Kafka consumer: reduce max.poll.records under backpressure
props.put("max.poll.records", 100); // default 500
props.put("max.poll.interval.ms", 300_000); // allow long processing
// SQS: reduce batch size + increase visibility timeout
ReceiveMessageRequest req = ReceiveMessageRequest.builder()
.queueUrl(url)
.maxNumberOfMessages(1) // process one at a time when struggling
.waitTimeSeconds(20) // long polling — fewer empty receives
.visibilityTimeout(300) // 5 min for heavy jobs
.build();
flowchart TD P[Producer] -->|publish rate| B[Broker / Queue] B -->|deliver| C[Consumer pool] C -->|slow processing| L[Consumer lag grows] L --> A1[Scale consumers] L --> A2[Rate-limit producers] L --> A3[Add partitions / queues] L --> A4[Sample non-critical traffic] A1 --> B A2 --> P
Infinite retry loops amplify backpressure. Consumer fails → message requeued → fails again →
exponential load on broken dependency. Cap retries (3–5), use exponential backoff + jitter, route to DLQ,
and alert. Kafka: configure delivery.timeout.ms and dead-letter topic in Kafka Connect / Streams.
Blocking producers (synchronous backpressure) protects the broker but increases API latency for users. Unbounded async buffers hide pressure until catastrophic failure. Production systems often combine: async publish with bounded buffer + 429 when buffer full + consumer auto-scale.
Kafka producers batch messages (linger.ms=5, batch.size=16KB) before sending—
trading latency for throughput. Under backpressure, buffer.memory fills; producer blocks or throws
BufferExhaustedException. Tune max.block.ms to fail fast instead of hanging request threads.
Ordering guarantees
Most systems need order per entity, not globally. Partition keys and single-threaded consumers are the standard tools—global FIFO is almost always the wrong goal.
Ordering levels
| Level | Guarantee | How to achieve | Cost |
|---|---|---|---|
| Global total order | All events worldwide in strict sequence | Single partition / single consumer | Throughput ceiling ~10–50K/s; no parallelism |
| Partition order | Ordered within partition/key | Kafka: same key → same partition; SQS FIFO: MessageGroupId | Parallelism = partition/group count |
| Causal order | Dependent events appear in cause-before-effect order | Version vectors, happen-before chains, single-writer per aggregate | Application-level coordination |
| Best-effort | No ordering promise | SQS standard, RabbitMQ with multiple consumers | Highest throughput; simplest scaling |
Kafka partition key strategy
Messages with the same key hash to the same partition and are consumed in produce order.
Use user_id, order_id, or account_id as key—not a constant,
which forces single-partition bottleneck.
# Preserve order per user across account updates
producer.send(
topic="account-events",
key=str(user_id).encode(), # same user → same partition
value=event_json.encode()
)
# Anti-pattern: null key or constant key → all messages to partition 0
producer.send(topic="events", key=b"all") # 1M/s becomes impossible
SQS FIFO constraints
FIFO queues guarantee order within a MessageGroupId and exactly-once processing (dedup window 5 min).
Throughput: 300 msg/s per message group, up to 3,000 msg/s per queue with batching.
Hot group (single group ID) becomes a serial bottleneck—same problem as single Kafka partition.
Order events for 1M users, 100 updates/sec peak per hot user, 10K users active:
partition by user_id → 10K independent order streams.
Kafka with 48 partitions handles 48 parallel hot users at full partition throughput (~50K/s each theoretical).
LinkedIn uses Kafka with member-id partition keys so profile updates for a given member
serialize correctly. Uber partitions ride events by ride_id—status transitions
(requested → matched → completed) must not arrive out of order for the same ride.
In interviews, say: "I need ordering per order_id, not globally.
Kafka partition key = order_id gives me serial processing per order with N-way parallelism across orders."
At-most-once, at-least-once, exactly-once
No distributed messaging system gives true exactly-once end-to-end without cooperation from producers, brokers, and consumers. Understand what each layer actually promises—and where idempotency saves you.
| Guarantee | Meaning | Failure behavior | Typical implementation |
|---|---|---|---|
| At-most-once | Message delivered 0 or 1 times | Fire-and-forget; crash after send, before ACK → lost | Kafka acks=0; metrics/telemetry; UDP-style logging |
| At-least-once | Message delivered ≥1 times | Retry on failure → duplicates possible | Kafka acks=all + consumer manual commit after process; SQS default |
| Exactly-once | Processed effectively once (no duplicate effect) | Requires idempotent consumer or transactional semantics | Kafka EOS (idempotent producer + transactions); SQS FIFO dedup; app-level idempotency keys |
The duplicate problem
At-least-once is the production default. Consumer processes message → crashes before ACK → message redelivered → duplicate charge, duplicate email, double inventory decrement. The fix is never "hope duplicates don't happen"— it is idempotent consumers.
Idempotent consumer pattern
- Every message carries a unique
idempotency_key(UUID, event ID, or business key) - Before side effects, check a dedup store (Redis SET, DB unique constraint, DynamoDB conditional write)
- If key seen → ACK and skip; if new → process + record key atomically with side effect
def handle_payment_event(event):
key = event["idempotency_key"] # e.g. payment_id
# Atomic check-and-set — Redis or DB unique index
if redis.setnx(f"processed:{key}", 1):
redis.expire(f"processed:{key}", 86400 * 7)
else:
return # duplicate — safe to ACK
charge_customer(event["amount"], event["customer_id"])
# If crash here, retry re-enters but setnx fails → no double charge
Kafka exactly-once semantics (EOS)
Kafka 0.11+ offers idempotent producers (enable.idempotence=true) preventing duplicate writes
on retry, and transactions for atomic read-process-write across topics.
EOS applies within Kafka—your DB write is still at-least-once unless you use
transactional outbox or two-phase commit.
sequenceDiagram participant App participant DB participant Outbox as Outbox Table participant Relay as Outbox Relay participant Kafka App->>DB: BEGIN TX App->>DB: UPDATE orders SET status='paid' App->>Outbox: INSERT event (same TX) App->>DB: COMMIT Relay->>Outbox: poll unpublished events Relay->>Kafka: publish Relay->>Outbox: mark published
Transactional outbox
Write business state + outbox row in one DB transaction. Separate relay process publishes to Kafka and marks sent. Guarantees: if DB commit succeeds, event eventually publishes; no orphan publishes without DB state. De facto standard for microservices needing reliable event emission.
"We target at-least-once delivery with idempotent consumers. Exactly-once Kafka EOS for stream processing internal state; external side effects use idempotency keys + outbox pattern. I don't claim end-to-end exactly-once—that requires distributed transactions we avoid."
Committing offset before processing gives at-most-once (lose messages on crash). Committing after processing gives at-least-once (duplicates on crash before commit). There is no free lunch—pick at-least-once + idempotency unless data loss is acceptable (metrics only).
Delivery guarantee answer template: "SQS gives at-least-once; I implement idempotent handlers with dedup keys in Redis/DB. For payment events, idempotency key = transaction_id with unique DB constraint. Outbox pattern ensures DB + event atomicity." Never say "Kafka gives exactly-once" without qualifying scope.