Problem · Read/write asymmetry
CQRS
Solution: Separate command (write) and query (read) models; writes go to OLTP, reads served from denormalized projections.
Trade-off: Eventual consistency on read side; dual-model maintenance and sync pipeline complexity.
Real-world: Microsoft Azure uses CQRS in event-sourced order processing; read models rebuilt from event log.
Problem · Audit trail + temporal queries
Event Sourcing
Solution: Persist state changes as immutable events; current state derived by replaying event stream.
Trade-off: Storage grows unbounded; snapshotting required; replay latency for cold aggregates.
Real-world: LMAX exchange processes billions of trades via event sourcing with in-memory replay.
Problem · Dataset exceeds single-node capacity
Sharding / Partitioning
Solution: Horizontally split data by shard key (user_id, tenant_id); route queries to correct shard.
Trade-off: Cross-shard queries expensive; hot shards; resharding is painful without consistent hashing.
Real-world: Instagram shards user data by user_id; Twitter shards tweets by tweet_id across thousands of nodes.
Problem · Uneven key distribution on ring
Consistent Hashing
Solution: Map keys and nodes to hash ring; each key owned by next clockwise node; virtual nodes for balance.
Trade-off: Still possible hot spots; range queries unsupported; more complex than modulo hashing.
Real-world: Amazon DynamoDB and Cassandra use consistent hashing for partition placement.
Problem · Expensive read path on every request
Materialized Views / Denormalization
Solution: Precompute join results into read-optimized tables or documents updated on write or via CDC.
Trade-off: Write amplification; stale reads unless sync is tight; schema duplication.
Real-world: Facebook TAO materializes social graph edges for sub-ms friend-list reads.
Problem · Celebrity traffic spikes
Fan-out on Write
Solution: On tweet/post, push to all follower feeds at write time; reads are O(1) cache lookup.
Trade-off: Write amplification for users with millions of followers; Katy Perry problem.
Real-world: Twitter hybrid: fan-out on write for normal users, fan-out on read for celebrities.
Problem · Millions of followers, write amplification
Fan-out on Read
Solution: Store posts in author timeline; merge follower timelines at read time.
Trade-off: Read latency grows with follower count; complex pagination.
Real-world: Twitter uses fan-out on read for accounts with >1M followers.
Problem · Mixed follower distribution
Hybrid Fan-out
Solution: Fan-out on write below threshold; fan-out on read above; tiered by follower count.
Trade-off: Two code paths to maintain; threshold tuning per product.
Real-world: Twitter's famous hybrid model documented in their engineering blog.
Problem · Traffic exceeds single server
Horizontal Scaling
Solution: Add stateless app servers behind load balancer; partition stateful tiers separately.
Trade-off: Requires shared-nothing or externalized state; session stickiness pitfalls.
Real-world: Netflix scales API tiers horizontally; state lives in Cassandra and EVCache.
Problem · Predictable traffic bursts
Auto-scaling
Solution: Scale instance count on CPU, QPS, or queue depth metrics with cooldown periods.
Trade-off: Cold start latency; over-provisioning cost; scale-down too aggressive causes flapping.
Real-world: AWS Auto Scaling Groups behind ALB for Black Friday e-commerce peaks.
Problem · Global users, latency
Geo-sharding / Multi-region
Solution: Deploy data and compute in regional cells; route users to nearest region.
Trade-off: Cross-region consistency hard; data residency compliance; operational complexity.
Real-world: Google Spanner multi-region with TrueTime; Uber geo-partitions ride data.
Problem · Read-heavy, write-rare
Read Replicas
Solution: Primary handles writes; N replicas serve read traffic with async replication lag.
Trade-off: Replication lag causes stale reads; failover promotion complexity.
Real-world: PostgreSQL read replicas on AWS RDS for analytics queries off primary.