System Design Cheat Sheets

Three dense quick references for interview day and daily architecture work. Each block is copy-to-clipboard—paste into your notes app, flash cards, or whiteboard prep doc. Toggle Interview mode to surface interview-specific callouts. Expand or collapse each panel; every code block has its own Copy button.

L4 L5 L6 interview Quick reference

💡 Pro Tip

Review these sheets the morning of your interview. Copy the Interview Cheat Sheet block into a notes doc and rehearse saying assumptions aloud: "300M DAU, 10 reads/day, peak 3×—that's ~35K read QPS." Numbers beat adjectives every time.

Fundamentals Cheat Sheet

Jeff Dean's latency hierarchy, availability math, CAP/PACELC trade-offs, and back-of-envelope formulas. Memorize the ratios (L1→RAM ~100×, RAM→SSD ~1000×) rather than exact nanoseconds.

Fundamentals — copy blocks

Latency numbers every engineer must know

LATENCY NUMBERS (modern hardware, order-of-magnitude)
────────────────────────────────────────────────────
Operation                    Latency      Ratio vs L1
────────────────────────────────────────────────────
L1 cache reference           ~1 ns        1×
L2 cache reference           ~4 ns        4×
L3 cache reference           ~40 ns       40×
Main memory (RAM)            ~100 ns      100×
SSD random read              ~100 μs      100,000×
HDD seek + read              ~5 ms        5,000,000×
Same-datacenter RTT          ~0.5 ms      —
Cross-region RTT (US↔EU)     ~50 ms       —
Read 1 MB from RAM           ~250 μs
Read 1 MB from SSD           ~1 ms
Read 1 MB from HDD           ~20 ms
Send 1 MB over 1 Gbps net    ~10 ms
────────────────────────────────────────────────────
KEY RATIOS TO MEMORIZE
  L1 → RAM     ≈ 100×
  RAM → SSD    ≈ 1,000×
  SSD → cross-region RTT ≈ 500×

DESIGN RULES
  • Hot path touching disk/network more than once → latency problem
  • Fix: move data closer (cache), touch fewer things (batch/denorm), do less (async/precompute)
  • Each same-DC RPC adds ~0.5 ms; 10 serial calls = 5 ms minimum
  • Cross-region sync replication → 50+ ms writes; plan for async or regional partitions

Availability nines

AVAILABILITY NINES — DOWNTIME BUDGET
────────────────────────────────────────────────────────────────
Availability   Downtime/year   Downtime/month   Typical tier
────────────────────────────────────────────────────────────────
99%            3.65 days       7.2 hours        Dev / internal tools
99.9%          8.76 hours      43.8 minutes     B2B SaaS, non-critical APIs
99.99%         52.6 minutes    4.38 minutes     Payments, core product APIs
99.999%        5.26 minutes    26.3 seconds     Telco, hospital, trading
99.9999%       31.5 seconds    2.63 seconds     Active-active multi-region
────────────────────────────────────────────────────────────────

FORMULAS
  Availability = MTBF / (MTBF + MTTR)
  Error budget (30 days) = (1 − SLO) × 43,200 minutes
    99.9% SLO → 43.8 min/month allowed downtime

RULE OF THUMB
  Each additional nine costs ~10× in engineering + infrastructure.
  Most consumer products target 99.9%–99.99%.

REDUNDANCY PATTERNS (quick)
  Active-passive  → standby idle; failover seconds–minutes; lower cost
  Active-active   → all nodes serve; instant reroute; conflict resolution needed
  N+1             → one failure absorbed; ~33% overhead at N=3
  Geo-redundant   → survives region loss; 2–3× infra + replication lag

CAP & PACELC summary

CAP THEOREM (during network partition — P is not optional)
────────────────────────────────────────────────────────
  C = Consistency   — every read sees latest write or error
  A = Availability  — every request gets non-error response (may be stale)
  P = Partition tolerance — system continues despite network splits

  Real choice: CP vs AP during partition (not at design time forever)

  CP (sacrifice A)     AP (sacrifice C)
  ─────────────────    ─────────────────
  etcd, ZooKeeper      Cassandra, DynamoDB
  HBase, sync RDBMS    CouchDB, DNS, Riak
  Reject minority      Both sides serve; may diverge
  Financial ledger     Shopping cart, likes, analytics

PACELC (Abadi 2012 — the normal-case trade-off)
────────────────────────────────────────────────────────
  If Partition → choose A or C
  Else (no partition) → choose Latency (L) or Consistency (C)

  System          Partition    Else        Class
  ────────────────────────────────────────────────────────
  Cassandra       PA           EL          PA/EL (default ONE)
  DynamoDB        PA           EL          eventual default; strong = 2× cost
  MongoDB         PC           EC          w:1 fast; w:majority consistent
  MySQL primary   PC           EC/EL       sync rep = EC; read replicas = EL
  Google Spanner  PC           EC          pays latency for global consistency

INTERVIEW ONE-LINER
  "CAP is per-operation during partition. Cart is AP; payment is CP.
   PACELC: even without partition we trade latency vs consistency on every read."

Estimation formulas

BACK-OF-ENVELOPE — 5-STEP FRAMEWORK
────────────────────────────────────────────────────────
1. Clarify assumptions: DAU, ops/user/day, read:write ratio, payload size, retention
2. QPS:  (DAU × ops/user) / 86,400  →  round 86,400 to 100K for mental math
3. Peak: avg QPS × peak factor (2–3× typical; 10× for viral)
4. Storage: DAU × ops × bytes/record × retention days (add 3× for indexes)
5. Bandwidth: peak QPS × avg request/response size
6. Servers: peak QPS / per-server capacity + 30% headroom

USEFUL ROUNDING CONSTANTS
  1 day   ≈ 100K seconds
  1 month ≈ 2.5M seconds
  1 year  ≈ 30M seconds
  1 server (API) ≈ 1K–10K RPS (use 1K conservative)
  1 SSD   ≈ 100K IOPS; 1 HDD ≈ 100 IOPS
  1 Gbps  = 125 MB/s; 10 Gbps = 1.25 GB/s

LITTLE'S LAW
  L = λ × W
  concurrent requests = arrival rate × avg time in system
  Example: 2,000 RPS × 50 ms = 100 in-flight requests

PERCENTILE SLOs (not averages)
  p50  — median; 20–50 ms for read APIs
  p99  — 99% faster; 100–300 ms product-dependent
  p999 — tail; cache miss, GC, cross-region failover

WORKED EXAMPLE (Twitter-scale reads)
  300M DAU × 10 reads/day / 100K = 30K avg → 90K peak (3×)
  300M × 0.5 tweets/day / 100K = 1.5K write avg → 4.5K peak

Powers of 2

POWERS OF TWO — MEMORIZE FOR STORAGE MATH
────────────────────────────────────────────────────────
Power    Exact value           Approx        Unit
────────────────────────────────────────────────────────
2^10     1,024                 ~1 Thousand   1 KB
2^20     1,048,576             ~1 Million    1 MB
2^30     1,073,741,824         ~1 Billion    1 GB
2^40     1,099,511,627,776     ~1 Trillion   1 TB
2^50     1,125,899,906,842,624 ~1 Quadrillion 1 PB
────────────────────────────────────────────────────────

QUICK CONVERSIONS
  1 KB = 10^3 bytes (decimal) vs 2^10 = 1,024 (binary — use 1,000 in interviews)
  1 million users × 1 KB/day = 1 GB/day
  1 billion requests/day ÷ 100K sec ≈ 10,000 RPS average

SANITY CHECKS
  100M users × 10 ops/day = 1B ops/day ≈ 10K RPS avg
  10K RPS × 1 KB response = 10 MB/s ≈ 80 Mbps peak bandwidth
  1 PB / 5 years tweets → need sharding + cold storage tiering

🔬 Under the Hood

L5 Latency numbers explain why Redis exists: a 1 ms cache hit beats a 5–10 ms PostgreSQL round-trip. Availability math explains why 99.99% needs automated failover—43 minutes/year at 99.9% burns fast during deploys.

🎯 Interview Tip

L4 When asked "why cache?", cite: "RAM ~100 ns, SSD ~100 μs—that's 1000×. Redis at 1 ms still beats DB at 5–10 ms." Quantitative reasoning beats "caching makes things faster."

Database Selection Cheat Sheet

Start with access patterns, not brand names. Five questions: read/write ratio, query shape, consistency needs, scale trajectory, operational maturity. Then map to store family and replication/sharding strategy.

Database selection — copy blocks

Decision matrix by store family

DATABASE SELECTION — 5 QUESTIONS FIRST
────────────────────────────────────────────────────────
1. Read vs write ratio?     90% reads → replicas + cache
2. Query shape?             point / range / join / geo / full-text / traversal
3. Consistency needs?       ledger (strong) vs likes (eventual)
4. Scale trajectory?        single node 2 years vs 100M rows day one
5. Ops maturity?            managed vs self-hosted Cassandra cluster

DECISION MATRIX
────────────────────────────────────────────────────────
Family        Best for                    Weak at
────────────────────────────────────────────────────────
Relational    ACID, joins, ad-hoc SQL     Horizontal writes w/o sharding
Document      Nested JSON, flexible schema Multi-doc ACID, huge documents
Wide-column   High writes, geo-distribution Ad-hoc joins, bad partition keys
Key-value     Cache, sessions, rate limits Complex queries, durability
Time-series   Metrics, IoT, downsampling  General OLTP, mutable rows
Search        Full-text, facets, ranking    Source of truth, strong consistency
Graph         Path queries, fraud rings     Bulk analytics at extreme scale
────────────────────────────────────────────────────────

QUICK PICK FLOW
  Structured + joins?        → PostgreSQL / MySQL
  Document JSON?             → MongoDB / DynamoDB
  Massive write throughput?  → Cassandra / ScyllaDB
  Sub-ms cache/session?      → Redis / Memcached
  Time-ordered metrics?      → TimescaleDB / ClickHouse / InfluxDB
  Full-text search?          → Elasticsearch / OpenSearch
  Relationship traversals?   → Neo4j / Neptune

POLYGLOT PATTERN (L6 signal)
  PostgreSQL = system of record
  Redis = cache-aside hot reads
  Elasticsearch = search index via Kafka CDC
  Name primary store + justify satellites per access pattern

Consistency models quick reference

CONSISTENCY SPECTRUM (weakest → strongest)
────────────────────────────────────────────────────────
Model              Guarantee                      Latency   Use case
────────────────────────────────────────────────────────
Eventual           Replicas converge over time    Lowest    DNS, CDN, view counts
Monotonic reads    Never go backward in time      Low       Feed pagination
Read-your-writes   User sees own writes           Low-med   Profile edits, settings
Causal             Cause precedes effect          Medium    Comments, messaging
Sequential         Global order, reads may lag    Med-high  Collaborative editing
Strong/Linearizable Latest write globally visible High      Bank balance, inventory
────────────────────────────────────────────────────────

SCENARIO → MODEL
  Payment / transfer     → Strong (fail closed on uncertainty)
  User profile edit      → Read-your-writes (sticky session or primary reads)
  Twitter timeline       → Eventual + monotonic reads
  Flash sale inventory   → Strong or optimistic locking
  Page view counter      → Eventual (approximate OK)
  Collaborative doc      → Causal or CRDT

READ-YOUR-WRITES IMPLEMENTATIONS
  • Sticky sessions to replica that handled write
  • Session token with last-write timestamp
  • Write-through cache updated synchronously
  • User writes + reads both hit primary; others hit replicas

INTERVIEW QUESTION TO ASK
  "Does the user need their own write immediately, or do ALL users
   need the latest data globally?" → narrows model in 10 seconds

Replication topologies

REPLICATION — THREE TOPOLOGIES
────────────────────────────────────────────────────────

1. SINGLE-LEADER (most RDBMS, MongoDB replica set)
   All writes → leader; replicas apply WAL/binlog/oplog
   Sync replication:  wait for replica ACK → stronger, higher latency
   Async replication: leader ACKs immediately → faster, replication lag
   Failover:          promote replica; risk split-brain without quorum
   Read scaling:      route reads to replicas (eventual consistency)
   Lag typical:       10 ms – seconds under load

2. MULTI-LEADER (active-active across regions)
   Writes accepted at multiple leaders; async sync between them
   Pros:  low write latency per region; high availability
   Cons:  write conflicts need resolution (LWW, CRDT, app merge)
   Use:   collaborative editing, calendars; NOT financial ledger

3. LEADERLESS / QUORUM (Dynamo-style: Cassandra, DynamoDB, Riak)
   N = replication factor; W = write quorum; R = read quorum
   Quorum: R + W > N  →  overlap guarantees fresh read
   Examples:
     N=3, W=2, R=2  →  tolerates 1 node down, consistent quorum
     N=3, W=1, R=1  →  fast, eventual (hinted handoff + read repair)
   DynamoDB:        eventual default; strong read = leader replica only

FAILOVER CHECKLIST
  □ Health checks on replication lag (not just TCP alive)
  □ Automated promotion with fencing token / STONITH
  □ Test failover quarterly—broken automation surfaces in real incidents
  □ Clients retry with idempotency keys after failover

Sharding quick reference

SHARDING — WHEN & HOW
────────────────────────────────────────────────────────
BEFORE SHARDING (each buys ~10× cheaper than sharding)
  1. Archive cold data
  2. Denormalize hot queries
  3. Connection pooling (PgBouncer)
  4. Read replicas
  5. Cache-aside (Redis)

WHEN TO SHARD
  Single-node write QPS exhausted (~10K–50K writes/sec depending on row size)
  Data size exceeds single-node storage with acceptable query latency
  Hot row / hot partition cannot be isolated otherwise

SHARD KEY SELECTION (most important decision)
  ✓ High cardinality (user_id, tenant_id)
  ✓ Even distribution (avoid monotonic timestamps alone)
  ✓ Query locality (co-locate data accessed together)
  ✗ Low cardinality (country_code alone)
  ✗ Hot keys (celebrity user → one shard overload)

SHARDING STRATEGIES
────────────────────────────────────────────────────────
Strategy          Routing              Pros              Cons
────────────────────────────────────────────────────────
Hash              hash(key) % N        Even spread       Resharding remaps most keys
Range             key ranges on shards Range queries OK  Hotspots on latest range
Directory         lookup table         Flexible          Lookup SPOF; manual ops
Geographic        region = shard       Data residency    Cross-region queries hard
Consistent hash   ring + virtual nodes Minimal remapping More complex client/router
────────────────────────────────────────────────────────

CROSS-SHARD OPERATIONS (expensive — avoid in hot path)
  JOIN across shards     → denormalize or scatter-gather + app merge
  Global uniqueness      → Snowflake/UUID; not DB sequence
  Aggregations           → precompute, rollups, or OLAP store
  Resharding             → dual-write or Vitess VReplication; plan early

TOOLS: Vitess (MySQL), Citus (PostgreSQL), MongoDB sharded cluster,
       DynamoDB partition key + sort key, Cassandra partition key

⚖️ Trade-off

L6 "We'll use PostgreSQL" is L5. L6: "PostgreSQL for orders with row-level locking on stock; Redis cache-aside for catalog (95% read); Elasticsearch via Kafka CDC for search—2s staleness SLA. Shard by tenant_id when write QPS exceeds 20K."

⚠️ Pitfall

Picking Cassandra because "it's web scale" without write-heavy access patterns is a red flag. Walk the five questions aloud; eliminate families before naming a product.

Interview Cheat Sheet

RADIO framework, time-boxed 45-minute clock, estimation script, architecture checklist, red/green flags, and eight canonical case study one-liners. Copy the whole block before your mock or live interview.

Interview — copy blocks

RADIO template

RADIO — UNIVERSAL SYSTEM DESIGN FRAMEWORK
────────────────────────────────────────────────────────
R  REQUIREMENTS
   Functional:     core features, user flows, MVP vs future
   Non-functional: scale (DAU/QPS), latency (p99 target), availability,
                   consistency, durability, security, cost
   Out of scope:   explicitly defer (e.g., ML ranking v2, multi-region v1)
   Ask:            "Who are the users? Read-heavy or write-heavy? Strong consistency needed?"

A  ARCHITECTURE
   Draw boxes:     Client → CDN → LB → API (stateless) → Cache → DB → Queue → Workers
   Label:          stateless vs stateful components; sync vs async paths
   Data flow:      write path vs read path (often different!)
   Say:            "API tier is stateless; session in Redis; DB is source of truth"

D  DATA MODEL
   Schema:         entities, relationships, indexes for access patterns
   Shard key:      if scale requires it—justify cardinality and locality
   Storage estimate: rows × bytes × retention (state assumptions)
   Say:            "Index on (user_id, created_at DESC) for timeline query"

I  INTERFACE (API)
   Key endpoints:  REST/gRPC; request/response shape
   Pagination:     cursor-based for feeds (not offset at scale)
   Idempotency:    Idempotency-Key header for writes/retries
   Rate limits:    per-user and global; 429 + Retry-After header

O  OPTIMIZATIONS & TRADE-OFFS
   Bottlenecks:    identify from estimation (DB, fan-out, hot keys)
   Caching:        what, where, TTL, invalidation, stampede prevention
   Async:          queue for slow path (email, transcode, index build)
   Scale:          sharding, read replicas, CDN, horizontal pods
   Close:          "We traded X for Y because product requires Z"

45-minute interview clock

45-MINUTE INTERVIEW CLOCK — TIME-BOX RUTHLESSLY
────────────────────────────────────────────────────────
Phase              Time        What to deliver
────────────────────────────────────────────────────────
Requirements       0–5 min     Clarify scope, DAU/QPS, latency, consistency,
                               explicit out-of-scope. ASK QUESTIONS.

Estimation         5–10 min    QPS, storage, bandwidth, server count.
                               State assumptions aloud. Round to 1 significant figure.

High-level design  10–20 min   Boxes + arrows: client, LB, API, cache, DB, queue.
                               Explain read path AND write path separately.

Deep dive          20–30 min   Interviewer picks: schema, cache, fan-out, consistency,
                               API, failure modes. Go deep on ONE area.

Scale & wrap       30–45 min   10× traffic plan, SPOFs, monitoring, trade-offs summary.
                               "If I had another 15 min, I'd detail X."
────────────────────────────────────────────────────────

TIME PRESSURE TACTICS
  • At 10 min without a diagram → stop requirements, start drawing
  • At 25 min without deep dive → pick your strongest component voluntarily
  • At 40 min → summarize trade-offs even if incomplete
  • Never silent > 15 seconds — narrate thinking process

LEVEL EXPECTATIONS
  L4:  Happy path, basic components, one scaling knob
  L5:  Trade-offs, failure modes, data model justification
  L6:  Cross-cutting concerns, operability, measured numbers, "when NOT to"
  Principal: Platform strategy, org implications, multi-year evolution

Estimation template (say aloud)

ESTIMATION SCRIPT — FILL IN THE BLANKS
────────────────────────────────────────────────────────
"Let me state assumptions before I calculate:
 • Daily active users: _______
 • Operations per user per day: _______
 • Read : write ratio: _______
 • Average payload size: _______ KB
 • Retention period: _______
 • Peak traffic multiplier: _______× average (I'll use 3× unless you say otherwise)"

READ QPS
  = DAU × reads_per_user / 100,000 seconds
  Peak read QPS = avg × peak_factor

WRITE QPS
  = DAU × writes_per_user / 100,000 seconds
  Peak write QPS = avg × peak_factor

STORAGE
  = DAU × writes_per_user × bytes_per_record × retention_days
  With indexes: multiply by 2–3×

BANDWIDTH
  = peak_read_QPS × response_size_bytes (+ upload for write-heavy)

SERVERS
  = peak_QPS / 1,000 RPS per app server × 1.3 headroom

SANITY CHECK
  "At _______ peak QPS, this is roughly _______ scale—I'd expect _______ pattern
   (cache / sharding / fan-out on write / CDN)."

EXAMPLE (say it):
  "300M DAU, 10 timeline reads/day, peak 3× → 30K avg, ~90K peak read QPS.
   0.5 tweets/user/day → 1.5K write avg, ~4.5K peak. That drives fan-out strategy."

Architecture checklist

ARCHITECTURE CHECKLIST — BEFORE YOU SAY "DONE"
────────────────────────────────────────────────────────
SCALE
  □ QPS estimated (read + write separately)
  □ Storage estimated with retention
  □ Horizontal scaling path identified
  □ Hot key / hot partition risk addressed

RELIABILITY
  □ Single points of failure named + mitigated
  □ Redundancy: active-passive or active-active justified
  □ Failover tested (not just "we have a replica")
  □ Circuit breakers / bulkheads for downstream failures
  □ Retry with exponential backoff + jitter (not blind retry)
  □ Idempotency keys on mutating APIs

PERFORMANCE
  □ Caching layer: what, TTL, invalidation, stampede prevention
  □ CDN for static/media at edge
  □ DB indexes match access patterns (no table scans on hot path)
  □ Async for slow path (notifications, transcoding, indexing)
  □ p99 latency target stated (not just "low latency")

DATA
  □ Consistency model per operation (not one-size-fits-all)
  □ Shard key chosen with cardinality + locality rationale
  □ Backup + disaster recovery mentioned for durable data

OBSERVABILITY
  □ Four golden signals: latency, traffic, errors, saturation
  □ SLO defined (e.g., 99.9% < 200 ms over 30 days)
  □ Alerting on SLO burn rate, not just threshold

SECURITY (brief — don't over-index unless asked)
  □ AuthN/AuthZ at API gateway
  □ Rate limiting / abuse prevention
  □ TLS in transit; encryption at rest for PII

Red flags & green flags

INTERVIEW RED FLAGS (avoid)          GREEN FLAGS (demonstrate)
────────────────────────────────────────────────────────────────────────
Draw before clarifying requirements   Ask 3–5 scoping questions first
Silent for 30+ seconds while drawing  Think aloud continuously
"We'll use microservices" day one     Start monolith; split when measured pain
"We need low latency" (no number)     "p99 < 200 ms; timeline can do 500 ms"
Strong consistency everywhere         Match consistency to operation
One database for everything           Polyglot with justified primary store
Cache without invalidation plan       TTL + event-driven invalidation
Ignore failure modes                  Name SPOFs + circuit breakers
Skip estimation                       Back-of-envelope in first 10 minutes
Buzzwords without trade-offs          "We chose X over Y because Z"
No out-of-scope boundaries            "V1 excludes multi-region; here's why"
Resume-driven architecture            Solve the stated problem first
Offset pagination at billion rows     Cursor-based keyset pagination
Sticky sessions without fallback        Externalized session in Redis
2PC across microservices              Saga / outbox / idempotent compensations
"Redis as primary DB" no persistence  Redis with AOF/RDB + replication for durability
────────────────────────────────────────────────────────────────────────

PHRASES THAT SIGNAL SENIOR LEVEL
  "At 10× traffic, the bottleneck moves from _______ to _______."
  "We fail closed on payment; we fail open on analytics."
  "Error budget at 99.9% gives us 43 min/month—we spend it on launches."
  "I'd load-test with open-loop fixed RPS to find real breaking points."
  "Shard key is tenant_id—high cardinality, query-local, rebalances per tenant."

8 case study one-liners

8 CANONICAL CASE STUDIES — PROBLEM, SCALE, KEY DECISION
────────────────────────────────────────────────────────────────────────

1. URL SHORTENER (L4)
   Problem:  Map long URL → short code; redirect; optional click analytics
   Scale:    100M URLs, 1000:1 read:write, 100K redirect RPS peak
   Key decision: Base62 hash vs counter (Snowflake); Redis cache hot URLs;
                 301 redirect; DB sharded by short_code hash

2. RATE LIMITER (L4)
   Problem:  Throttle requests per user/IP/API key; sliding or token bucket
   Scale:    1M RPS at edge; sub-ms check; distributed across PoPs
   Key decision: Token bucket in Redis with Lua atomicity; local cache +
                 Redis sync for edge; 429 + Retry-After; fail-open vs closed

3. KEY-VALUE STORE (L4)
   Problem:  In-memory get/put/delete with optional persistence
   Scale:    1B keys, 100K ops/sec, <1 ms p99
   Key decision: Consistent hashing for sharding; replication factor 3;
                 hinted handoff; write-ahead log for durability

4. TWITTER FEED (L5)
   Problem:  Home timeline—tweets from followees, ranked by time
   Scale:    300M DAU, 90K read QPS peak, celebrity with 50M followers
   Key decision: Hybrid fan-out—write for normal users, read for celebrities;
                 Redis timeline cache; Snowflake tweet IDs; pull+push merge

5. YOUTUBE (L5)
   Problem:  Upload video, transcode, stream globally with adaptive bitrate
   Scale:    500 hours uploaded/min, 1B playback hours/day
   Key decision: Object storage (S3) for blobs; async transcoding queue;
                 CDN edge caching; DASH/HLS segments; metadata in SQL + cache

6. UBER (L5)
   Problem:  Real-time driver location, ride matching, surge pricing
   Scale:    1M concurrent drivers, 10K rides/sec peak, geo queries
   Key decision: Redis geospatial index for nearby drivers; WebSocket for
                 location push; matching service with supply/demand zones;
                 Cassandra for trip history; Kafka for event pipeline

7. WHATSAPP (L5)
   Problem:  1:1 and group messaging, delivery/read receipts, offline delivery
   Scale:    2B users, 100B messages/day, groups up to 256 members
   Key decision: WebSocket long-lived connections; message queue per device
                 for offline; Cassandra for message store; end-to-end encryption;
                 fan-out on write for groups with large-member optimization

8. GOOGLE MAPS (L6)
   Problem:  Routing, real-time traffic, map tile serving globally
   Scale:    1B users, petabyte road graph, sub-second route queries
   Key decision: Precomputed tile CDN; graph partitioned geographically;
                 A* on hierarchical road network; real-time traffic via
                 aggregate probe data (Kafka stream); edge caching of tiles
────────────────────────────────────────────────────────────────────────

CASE STUDY PIVOT PHRASES
  "The interesting part here is _______—let me go deep on that."
  "At celebrity scale, fan-out on write breaks—here's the hybrid fix."
  "Read path and write path differ—let me draw them separately."

🎯 Interview Tip

L5 Switch to Interview track before mocks. Copy the 45-minute clock and estimation template blocks into a second monitor. Glance at minute marks— interviewers notice time discipline as much as technical depth.

🏆 Senior Signal

Principal After the architecture checklist, add org context: "This design implies a platform team owning Kafka + schema registry; product teams publish events via SDK. Conway's law—we align service boundaries to team boundaries."

📦 Real World

Instagram engineers report back-of-envelope estimates during design reviews catch 80% of scaling issues before code ships. Google interviewers explicitly score structured thinking—the RADIO skeleton matters as much as the final diagram.