Real-World Case Studies

Eight canonical system design problems—each one appeared in FAANG interviews and powers production systems at billion-user scale. Every section walks through requirements, back-of-envelope numbers, architecture diagrams, deep dives into the hard parts, and the trade-offs elite teams actually make. These are not toy diagrams: they are the patterns behind bit.ly, Twitter, YouTube, Uber, WhatsApp, and Google Maps.

L4 L5 L6 interview Chapter 7 · 8 case studies

How to use these case studies

Each case study follows the same structure: functional requirements → non-functional requirements → scale estimation → high-level architecture → deep dives → trade-offs. Switch to Interview track for tips on what interviewers probe at each level.

System design interviews rarely ask you to reproduce production systems verbatim. They test whether you can clarify requirements, estimate scale, draw a coherent architecture, and reason about failure modes. These eight problems cover the majority of pattern combinations you'll encounter: read-heavy caching, write-heavy fan-out, geospatial indexing, stream processing, distributed consensus trade-offs, and CDN delivery.

Case study Primary patterns Typical level Hardest deep dive
URL Shortener ID generation, cache-aside, async analytics L4 301 vs 302, collision-free IDs at scale
Rate Limiter Token bucket, Redis Lua, distributed sync L4L5 Sliding window accuracy vs memory
Key-Value Store Hash table, WAL, replication, consistent hashing L5 Quorum reads/writes, anti-entropy repair
Twitter Feed Hybrid fan-out, sorted sets, celebrity problem L5L6 Fan-out on write vs read for 50M followers
YouTube Upload pipeline, transcoding, CDN, recommendations L5L6 Adaptive bitrate, cold-start recommendations
Uber Geospatial index, matching, surge, state machine L5L6 Real-time driver location at city scale
WhatsApp Cassandra messaging, E2E encryption, receipts L6 100B msgs/day partition strategy
Google Maps Graph routing, tile CDN, real-time traffic L6 Contraction hierarchies, sub-second routing
🎯 Interview Tip

Pick 3–4 case studies and practice the full 45-minute RADIO flow on each: Requirements (5 min), Architecture (10 min), Data model (5 min), Interface/API (5 min), Optimizations & deep dives (15 min), Bottlenecks (5 min). Interviewers reward structured thinking over perfect diagrams.

1. URL Shortener (bit.ly)

The canonical L4 interview question. Shorten long URLs into compact aliases, resolve them at read time with sub-10 ms latency, and optionally track click analytics without blocking the redirect path.

Functional requirements

  • Create short URL — given a long URL, return a unique short code (custom alias optional for premium users)
  • RedirectGET /{shortCode} resolves to original URL via HTTP redirect
  • Analytics (optional) — track clicks: timestamp, referrer, geo, user-agent
  • Expiration — TTL on links; soft-delete vs hard-delete policy
  • Custom aliases — user-chosen codes with uniqueness check

Non-functional requirements

Requirement Target Why it matters
Read latency p99 < 10 ms Redirects are user-facing; every ms hurts conversion
Availability 99.99% on read path Broken links damage brand trust instantly
Write throughput ~1K–10K new URLs/sec peak Lower than reads but must never block redirects
Read/write ratio 100:1 typical Optimize aggressively for read path
URL uniqueness Global, collision-free Duplicate codes = wrong redirects = security incident

Scale estimation

📐 Estimation

Assumptions: 500M DAU, 2 link creations + 20 redirects per user per day.
Writes: 500M × 2 / 86400 ≈ 12K writes/sec (peak ×3 → 36K/sec).
Reads: 500M × 20 / 86400 ≈ 116K reads/sec (peak ×3 → 350K RPS).
Storage (10 years): 500M × 2 × 365 × 10 = 3.65T URLs × ~500 bytes = ~1.8 PB (plan tiered archival).
Bandwidth: 350K RPS × 500 bytes redirect response ≈ 175 MB/s (negligible vs CDN).

High-level architecture

flowchart TB
  Client[Client / Browser]
  LB[Load Balancer]
  API[Redirect Service]
  Create[Create Service]
  Redis[(Redis Cache\nshortCode → longURL)]
  MySQL[(MySQL\nsource of truth)]
  Kafka[Kafka\nclick events]
  Analytics[Analytics Pipeline\nFlink → ClickHouse]

  Client -->|GET /abc123| LB --> API
  API --> Redis
  Redis -->|miss| MySQL
  API -->|async| Kafka
  Client -->|POST /shorten| LB --> Create
  Create --> MySQL
  Create --> Redis
  Kafka --> Analytics

ID generation strategies

Strategy Example Pros Cons
Auto-increment + Base62 123456789 → "8M0kX" Simple, dense, no collision check Single DB writer bottleneck; predictable/enumerable IDs
Snowflake / UUID 64-bit time-ordered ID Distributed, no DB coordination Longer codes; need encoding scheme
Hash of URL + counter MD5 prefix + retry on collision Same long URL → same short URL (dedup) Collision handling complexity at scale
Pre-generated ID pool Range allocator per server Zero contention at write time Pool exhaustion, range management ops

L5 Production choice for bit.ly-style systems: Snowflake IDs encoded in Base62 for distributed creation, with MySQL as source of truth and Redis cache-aside on read. Base62 uses [a-zA-Z0-9] — 7 chars = 627 ≈ 3.5 trillion unique codes.

# Base62 encode — 7 chars covers 3.5T IDs
ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

def encode_base62(num: int) -> str:
    if num == 0:
        return ALPHABET[0]
    chars = []
    while num:
        num, rem = divmod(num, 62)
        chars.append(ALPHABET[rem])
    return "".join(reversed(chars))

# Snowflake: 41-bit timestamp | 10-bit machine | 12-bit sequence
# → ~4096 IDs/ms per machine, sortable, no DB round-trip

301 vs 302 redirect — deep dive

This is the most common L4 trap question. The choice affects SEO, analytics accuracy, and cache behavior.

Status Browser behavior Use when
301 Moved Permanently Browser caches redirect; may skip shortener on repeat visits Permanent links, SEO consolidation, minimal server load after first hit
302 Found (Temporary) Browser always hits shortener first Analytics on every click, A/B destination testing, expiring links
307 Temporary Preserves HTTP method (POST stays POST) API redirects, non-GET flows
⚖️ Trade-off

301 reduces server load (browser caches) but under-counts analytics after first visit. 302 guarantees every click is counted but doubles latency on repeat visits. bit.ly uses 302 for marketing/analytics products; internal/permanent links may use 301. Say both options in interviews and tie to product requirements.

MySQL + Redis read path

  1. GET shortCode from Redis (short:abc123 → longURL)
  2. On miss: query MySQL by primary key (short_code indexed), populate Redis with TTL (24h–7d)
  3. Return 302 redirect; fire-and-forget click event to Kafka
  4. On create: INSERT MySQL, SET Redis, return short URL

MySQL schema: short_code VARCHAR(10) PK, long_url TEXT, user_id, created_at, expires_at, click_count BIGINT (approximate via periodic flush from analytics). Shard MySQL by hash(short_code) when write QPS exceeds single-node capacity (~10K/sec sustained writes).

Analytics via Kafka

sequenceDiagram
  participant Redirect as Redirect Service
  participant Kafka as Kafka
  participant Flink as Flink / Spark
  participant CH as ClickHouse
  participant Dash as Analytics Dashboard

  Redirect->>Kafka: click event (async, non-blocking)
  Note over Redirect: never wait for analytics
  Kafka->>Flink: stream aggregate
  Flink->>CH: hourly rollups
  CH->>Dash: query p99 < 2s

Click events: { short_code, timestamp, ip, user_agent, referrer, geo }. Partition Kafka by short_code for ordered per-link aggregation. Never block redirect on analytics write — if Kafka is down, sample/drop events rather than fail redirects.

Trade-off Radar — URL Shortener

Scores 1–5 (higher = better/more emphasis). Chart.js reads data-* attributes on mount.

⚠️ Pitfall

Don't put analytics synchronously on the redirect path. At 350K RPS, even 1 ms extra Kafka latency adds 350 seconds of aggregate wait per second. Use async fire-and-forget with local buffer + backpressure drop.

📦 Real World

bit.ly processes billions of redirects monthly. They use multi-region Redis clusters for hot codes, MySQL/Aurora for persistence, and real-time analytics pipelines. TinyURL early architecture was single MySQL + mod-hash — worked until read QPS forced caching layer.

🏆 Senior Signal

"Read path: Redis cache-aside, 302 for analytics, async Kafka click stream. Write path: Snowflake ID + Base62, MySQL INSERT, cache warm. Scale reads with Redis Cluster; scale writes with DB sharding by short_code hash. 350K peak RPS needs ~4 Redis nodes at 100K ops each with 80% hit rate."

🎯 Interview Tip

L4 Must mention: Base62/Snowflake ID, Redis cache, 301 vs 302 trade-off, read-heavy optimization.
L5 Add: Kafka analytics decoupling, sharding strategy, cache stampede mutex on miss.
L6 Discuss: multi-region active-active, custom alias race conditions (DB unique constraint + Redis SETNX), abuse detection (malware URL scanning async on create).

2. Rate Limiter

Protect APIs and infrastructure from abuse, ensure fair usage across tenants, and enforce SLA tiers. Rate limiting appears in every gateway, every public API, and almost every L4–L5 interview.

Functional requirements

  • Limit requests per user, IP, API key, or endpoint
  • Return 429 Too Many Requests with Retry-After header when exceeded
  • Support multiple tiers: free (100/min), pro (10K/min), enterprise (custom)
  • Configurable limits per route (POST /upload stricter than GET /health)
  • Whitelist/blacklist bypass rules

Non-functional requirements & scale

Metric Target
Decision latency < 1 ms p99 (on critical path)
Throughput Match API gateway RPS (100K–1M+ RPS)
Accuracy Within 1% of configured limit (sliding window)
Availability Fail-open vs fail-closed policy explicit
Memory ~100 bytes per active key (user/IP combo)
📐 Estimation

1M RPS API, 10M unique users/day active → ~500K concurrent rate-limit keys in Redis. 500K × 100 bytes ≈ 50 MB — trivial for Redis Cluster. At 1M RPS, need ~10 Redis nodes (100K ops each) or local token bucket + Redis sync for global limits.

Algorithm comparison

Algorithm Behavior Burst Memory Accuracy
Fixed window Count requests per clock minute 2× burst at window boundary O(1) per key Poor at edges
Sliding window log Store timestamp of each request None O(limit) per key Exact
Sliding window counter Weighted blend of current + previous window Minimal O(1) per key ~99% accurate
Token bucket Tokens refill at fixed rate; burst up to bucket size Configurable burst O(1) per key Good
Leaky bucket Requests queue; processed at fixed rate Smooths traffic O(queue depth) Strict output rate
flowchart LR
  subgraph tb [Token Bucket]
    Refill[Refill 10/sec] --> Bucket[(Bucket\ncapacity 100)]
    Req1[Request] -->|take 1 token| Bucket
    Bucket -->|tokens > 0| Allow[Allow]
    Bucket -->|empty| Deny429[429 Deny]
  end
  subgraph lb [Leaky Bucket]
    In[Incoming burst] --> Queue[(Queue)]
    Queue -->|leak 10/sec| Out[Steady output]
  end

Token bucket — deep dive

Each client has a bucket with capacity B and refill rate R tokens/sec. On request: if tokens ≥ 1, decrement and allow; else reject with 429. Allows bursts up to B while maintaining average rate R over time. Used by AWS API Gateway, Stripe, and most SaaS APIs.

-- Redis Lua: atomic token bucket (EVAL script)
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])

local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last = tonumber(data[2]) or now

local elapsed = math.max(0, now - last)
tokens = math.min(capacity, tokens + elapsed * rate)

if tokens >= requested then
  tokens = tokens - requested
  redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
  redis.call('EXPIRE', key, math.ceil(capacity / rate) + 1)
  return 1  -- allowed
else
  return 0  -- denied
end

Sliding window — deep dive

Fixed window allows 2× burst at minute boundaries (59 requests at 00:59 + 60 at 01:00 = 119 in 2 seconds). Sliding window counter fixes this with O(1) memory:

count = prev_window_count × (1 - elapsed/window) + curr_window_count

Store two counters per key: current and previous window. Redis key: ratelimit:{user}:{window_id}. Accuracy ~99%; used by Cloudflare and Kong gateway at scale.

Distributed rate limiting

flowchart TB
  Client --> Edge[Edge / CDN\n coarse IP limits]
  Edge --> GW[API Gateway\n per-key limits]
  GW --> Local[Local token bucket\n 80% decisions]
  Local -->|sync every 100ms| Redis[(Redis Cluster\n global truth)]
  GW --> Service[Microservice]
  Service -->|optional 2nd check| Redis
  • Centralized (Redis) — accurate global limits; adds ~0.5–1 ms latency; single cluster is bottleneck
  • Local + sync — each node holds local bucket, syncs quota to Redis periodically; faster but briefly over-limit during sync
  • Edge limiting — Cloudflare/Kong at PoP; stops abuse before origin; coarse granularity (IP/CIDR)
  • Consistent hashing — route same user to same Redis shard; avoids cross-shard coordination
⚖️ Trade-off

Fail-open (allow on Redis down): prioritizes availability; risk of abuse during outage. Fail-closed (deny all): protects infrastructure; causes user-facing outage. Stripe fails closed on payment endpoints, fail-open on read-only catalog. State your policy explicitly.

⚠️ Pitfall

Rate limiting by IP alone fails behind NAT (corporate offices, mobile carriers). Combine IP + API key + user ID. For anonymous endpoints, use IP + fingerprint + progressive CAPTCHA escalation.

📦 Real World

Stripe publishes rate limits per API key with Retry-After headers. GitHub uses sliding window with response headers: X-RateLimit-Remaining, X-RateLimit-Reset. AWS API Gateway uses token bucket at account and stage level.

🎯 Interview Tip

Draw three boxes: client → gateway (local bucket) → Redis (global sync). Explain token bucket for burst tolerance. Mention Redis Lua for atomicity — without it, race conditions double-count under concurrency. L6: discuss race between limiter and billing meter, clock skew across nodes, and adaptive limits under DDoS.

3. Key-Value Store (Dynamo-style)

Design a distributed, highly available key-value store like DynamoDB, Riak, or the storage layer behind Redis Cluster. Tests understanding of partitioning, replication, consistency trade-offs, and failure recovery.

Functional requirements

  • put(key, value) — store arbitrary blob (≤ 1 MB per value typical)
  • get(key) — retrieve latest value or not-found
  • delete(key) — tombstone for eventual removal
  • Optional: conditional writes (put if not exists), TTL expiration
  • No complex queries — point lookups only (by design)

Non-functional requirements

Requirement Target
Availability 99.99% — always accept writes during partition
Latency GET p99 < 10 ms; PUT p99 < 20 ms
Throughput 1M+ ops/sec cluster-wide
Durability No acknowledged write loss on single-node failure
Scalability Linear scale-out by adding nodes
📐 Estimation

1B keys × 1 KB average = 1 TB raw data. Replication factor 3 → 3 TB cluster. 100K ops/sec per node × 30 nodes = 3M ops/sec cluster capacity. WAL write: 100K × 1 KB = 100 MB/s disk per node — NVMe handles easily.

High-level architecture

flowchart TB
  Client[Client SDK]
  Coord[Coordinator / Router]
  subgraph ring [Consistent Hash Ring]
    N1[Node A\nvnodes 0-85]
    N2[Node B\nvnodes 86-170]
    N3[Node C\nvnodes 171-255]
  end
  subgraph nodeA [Each Storage Node]
    HT[(In-memory\nHash Table)]
    WAL[(Write-Ahead Log)]
    SST[(SSTable / Disk)]
    HT --> WAL --> SST
  end
  Client --> Coord --> ring
  N1 --> nodeA

In-memory hash table — deep dive

Hot data lives in an in-memory hash map (open addressing or chained buckets). Each entry: { key, value, version, expiry }. Memory budget: if node has 64 GB RAM, reserve ~48 GB for hash table, rest for buffers/OS. Eviction: LRU for cache tier; for durable store, flush cold entries to SSTable on disk.

Write-Ahead Log (WAL) — deep dive

Every write appends to WAL before acknowledging the client. On crash recovery: replay WAL to rebuild in-memory state. WAL segment rotation: new file every 64–128 MB; old segments deleted after SSTable flush.

  1. Client sends PUT
  2. Append (key, value, seq, timestamp) to WAL (fsync or group commit batch)
  3. Update in-memory hash table
  4. Replicate to N-1 nodes (async or sync per consistency level)
  5. Return success to client
⚖️ Trade-off

fsync every write: durable but ~1K writes/sec on HDD, ~100K on NVMe. Group commit (batch fsync every 1 ms): 10× throughput, lose up to 1 ms of writes on crash. Dynamo-style systems use group commit + replication for durability without single-disk fsync bottleneck.

Replication & quorum

Each key maps to a preference list of N nodes (typically N=3) via consistent hashing. Coordinator sends write to all N replicas; read from R replicas.

Config W + R Behavior
Strong W=2, R=2 (N=3) Quorum overlap guarantees latest read; higher latency
Eventual W=1, R=1 Fastest; may read stale; repair via read-repair
Default Dynamo W=2, R=1 or W=1, R=2 Tunable per client request
sequenceDiagram
  participant C as Client
  participant Co as Coordinator
  participant R1 as Replica 1
  participant R2 as Replica 2
  participant R3 as Replica 3

  C->>Co: PUT key (W=2)
  Co->>R1: write v5
  Co->>R2: write v5
  Co->>R3: write v5 (async)
  R1-->>Co: ack
  R2-->>Co: ack
  Co-->>C: success (2 of 3)

  C->>Co: GET key (R=2)
  Co->>R1: read
  Co->>R2: read
  Co-->>C: return max version

Consistent hashing — deep dive

Place nodes and keys on a hash ring (0 to 232-1). Key maps to first node clockwise on ring. Virtual nodes: each physical node owns 100–256 vnodes for even load distribution. Adding a node: only adjacent key ranges migrate (~K/N keys).

Node A Node B Node C key→B

Failure handling

  • Hinted handoff — if preferred node down, write to next node on ring; return when original recovers
  • Read repair — on read, if replicas disagree, write latest version to stale replicas
  • Anti-entropy — background Merkle tree comparison between replicas; sync divergent keys
  • Gossip membership — nodes discover failures via SWIM protocol; update ring state
🏆 Senior Signal

"AP system by default: W=1 for low-latency writes, R=2 for read-your-writes when needed. Vector clocks resolve conflicts on concurrent writes to same key — client merges or LWW with timestamp. This is exactly the Dynamo paper trade-off space."

🎯 Interview Tip

L5+: draw the hash ring with vnodes. Explain W/R/N quorum math: W + R > N guarantees overlap. L6: compare to Raft (strong consistency, leader bottleneck) vs Dynamo (always writable, eventual consistency). Mention vector clocks vs last-write-wins — LWW loses updates silently.

4. Twitter Feed (Home Timeline)

Design the home timeline: when a user opens Twitter/X, show recent tweets from people they follow— sorted, paginated, and fresh within seconds. The celebrity problem makes this one of the hardest L5–L6 questions.

Functional requirements

  • Post tweet — user publishes ≤ 280 chars (+ media metadata)
  • Home timeline — paginated feed of tweets from followed users, newest first
  • Follow / unfollow — update social graph
  • Search & trends — trending topics (optional scope extension)
  • Media — images/video stored separately; timeline holds references

Scale numbers

📐 Estimation

300M DAU, 2 tweets + 50 timeline reads per user/day.
Write: 300M × 2 / 86400 ≈ 7K tweets/sec (peak ×3 → 21K/sec).
Read: 300M × 50 / 86400 ≈ 174K timeline reads/sec (peak → 520K/sec).
Storage: 7K/sec × 86400 × 365 × 500 bytes ≈ 110 TB/year tweets alone.
Fan-out: avg 200 followers × 7K tweets = 1.4M timeline writes/sec if pure fan-out-on-write.

Fan-out on write vs fan-out on read

Strategy Write cost Read cost Best for
Fan-out on write O(followers) per tweet O(1) — precomputed timeline Normal users (<10K followers)
Fan-out on read O(1) — store tweet once O(following) — merge at read Celebrities (>1M followers)
Hybrid O(min(followers, 10K)) O(1) + merge celebrity tweets Production Twitter approach
flowchart TB
  Post[User posts tweet]
  Post --> Check{Followers > 10K?}
  Check -->|No| FOW[Fan-out on write\npush to follower timelines]
  Check -->|Yes| Celeb[Store in celebrity bucket\nno fan-out]
  FOW --> Redis[(Redis Sorted Set\ntimeline:userId)]
  Celeb --> TweetStore[(Cassandra\ntweets by user)]
  Read[User reads timeline] --> Redis
  Read --> Merge[Merge celebrity tweets\nfrom Cassandra]
  Merge --> Response[Sorted merged feed]

Redis sorted sets — deep dive

Each user's home timeline is a Redis sorted set (ZSET): score = tweet timestamp (or snowflake ID), member = tweet_id. ZADD timeline:{userId} {timestamp} {tweetId} ZRANGE timeline:{userId} 0 49 REV → latest 50 tweets in O(log N + 50).

  • Cap timeline at ~800 entries; trim oldest on insert (sliding window cache)
  • 800 × 8 bytes tweet_id × 300M users = ~192 GB if every user had full cache — most inactive; use tiered storage
  • Hot users' timelines in Redis; cold users fan-out-on-read from Cassandra

Cassandra for tweet storage

Partition key design drives everything:

Table Partition key Clustering Query
tweets_by_user user_id tweet_id DESC User profile timeline
timeline_cache viewer_id tweet_id DESC Fallback precomputed timeline
social_graph follower_id followee_id Who does user follow?

The celebrity problem — deep dive

Katy Perry (~100M+ followers): one tweet = 100M Redis ZADD operations. At 1 ms each (absurdly optimistic), that's 27+ hours. Even at 100K ops/sec cluster-wide, one celebrity tweet consumes the entire cluster for 1000 seconds.

Solution — hybrid fan-out:

  1. Users with <10K followers: fan-out on write to Redis timelines
  2. Celebrities (>10K): tweet stored only in their Cassandra partition
  3. On timeline read: fetch precomputed timeline + fetch recent tweets from each followed celebrity + merge sort
  4. Cache merged celebrity slice separately (5–30 sec TTL)

Trends — HyperLogLog

Counting unique users per hashtag at Twitter scale is impossible with exact counters. HyperLogLog in Redis: ~12 KB memory per key, ~0.81% error, counts billions of uniques. Pipeline: tweet → Kafka → Flink extracts hashtags → PFADD trend:{hashtag} {userId} → periodic PFCOUNT for ranking. Top-K via Count-Min Sketch + heap.

Trade-off Radar — Twitter Feed

📦 Real World

Twitter's 2012 architecture blog described Redis timelines + MySQL/Cassandra + hybrid fan-out. They migrated to Manhattan (internal KV store) for timelines. The hybrid fan-out pattern remains the standard answer — Instagram uses similar approach for Stories feed.

🎯 Interview Tip

Always compute fan-out cost: 7K tweets × 200 avg followers = 1.4M writes/sec — say this number out loud. Then explain hybrid. L6: discuss ranking (ML score vs chronological), ads insertion, and read-your-own-writes consistency when you tweet and immediately refresh feed.

5. YouTube (Video Platform)

Design a video upload, processing, and streaming platform serving billions of hours watched monthly. Spans blob storage, async transcoding pipelines, CDN delivery, and recommendation systems.

Functional requirements

  • Upload video — resumable upload, metadata (title, description, tags)
  • Transcode — multiple resolutions (360p–4K) and formats (H.264, VP9, AV1)
  • Stream — adaptive bitrate playback with low startup latency (<2 sec)
  • Search & recommendations — home feed, related videos
  • Analytics — view counts, watch time, creator dashboard

Scale numbers

📐 Estimation

2B logged-in users/month, 500M hours watched/day, avg 720p @ 2 Mbps.
Uploads: ~500 hours/minute uploaded → ~8 uploads/sec average, peak ×10 → 80/sec.
Views: billions/day → ~100K–500K stream starts/sec peak globally.
Storage: 500 hr/min × 60 × 24 × 1 GB/hr (raw) ≈ 720 TB/day raw; transcoded 3–5× → 2–3 PB/day (most cold/archived).
CDN egress: 500M hr × 2 Mbps ≈ 278 Tbps daily average — CDN is non-negotiable.

Upload pipeline

flowchart LR
  Creator[Creator App]
  Upload[Upload Service\nresumable chunks]
  Blob[(Object Storage\nS3 / GCS)]
  Meta[(Metadata DB\nvideo_id, status)]
  Queue[SQS / Pub-Sub]
  Transcode[Transcoding Farm\nFFmpeg workers]
  CDN[CDN Origin]
  Catalog[Catalog Service]

  Creator -->|chunked PUT| Upload
  Upload --> Blob
  Upload --> Meta
  Upload -->|video.uploaded| Queue
  Queue --> Transcode
  Transcode -->|read raw| Blob
  Transcode -->|write HLS segments| CDN
  Transcode --> Meta
  Meta --> Catalog

Resumable upload — deep dive

  1. Client requests upload session → server returns upload_id + signed chunk URLs
  2. Upload 5–50 MB chunks with Content-Range headers
  3. Server tracks received byte ranges in Redis/DB
  4. On disconnect: client resumes from last confirmed byte
  5. Complete: assemble manifest, trigger transcoding job

YouTube uses resumable HTTP uploads (similar to Google Cloud Storage protocol). Never hold full video in app server memory.

Transcoding with FFmpeg — deep dive

Resolution Bitrate Transcode time (1 hr video) Segment size
360p 800 Kbps ~15 min (1× realtime) 2–6 sec HLS segments
720p 2.5 Mbps ~30 min 2–6 sec
1080p 5 Mbps ~45 min 2–6 sec
4K 15 Mbps ~2 hr (GPU accelerated) 2–6 sec

Transcoding farm: Kubernetes jobs, one FFmpeg process per job, GPU nodes for 4K/AV1. Priority queue: popular creators / live replay first. Spot/preemptible instances for 70% cost savings. Output: HLS (.m3u8 manifest + .ts segments) and DASH (.mpd + .m4s) for browser compatibility.

# FFmpeg: generate HLS ladder from source
ffmpeg -i input.mp4 \
  -filter_complex "[0:v]split=3[v1][v2][v3]; \
    [v1]scale=640:360[v1out]; [v2]scale=1280:720[v2out]; [v3]scale=1920:1080[v3out]" \
  -map "[v1out]" -c:v libx264 -b:v 800k -map a:0 -c:a aac -b:a 96k -f hls stream_360.m3u8 \
  -map "[v2out]" -c:v libx264 -b:v 2500k -map a:0 -c:a aac -b:a 128k -f hls stream_720.m3u8 \
  -map "[v3out]" -c:v libx264 -b:v 5000k -map a:0 -c:a aac -b:a 192k -f hls stream_1080.m3u8

CDN & HLS/DASH delivery

sequenceDiagram
  participant Player
  participant CDN as CDN Edge
  participant Origin as Origin / Storage

  Player->>CDN: GET master.m3u8
  CDN-->>Player: 360p/720p/1080p variants
  Player->>Player: measure bandwidth
  Player->>CDN: GET 720p segment 001.ts
  CDN-->>Player: video chunk
  Note over Player: switch to 1080p if bandwidth allows
  Player->>CDN: GET 1080p segment 002.ts
  CDN->>Origin: cache miss fetch
  Origin-->>CDN: segment
  CDN-->>Player: video chunk

Adaptive bitrate (ABR): player monitors buffer and bandwidth, switches quality per segment. CDN caching: popular videos cached at edge; long-tail served from origin. Cache key: video_id + resolution + segment_index. Immutable segments = infinite TTL.

Recommendations — deep dive

  • Candidate generation — collaborative filtering, content similarity, subscribed channels (millions → hundreds)
  • Ranking — ML model scores candidates on watch probability, satisfaction, diversity
  • Two-tower model — user embedding × video embedding dot product for fast retrieval at scale
  • Feature store — real-time features (last 10 watches) + batch features (user demographics)
  • Freshness — new uploads indexed within minutes via stream processing
⚖️ Trade-off

Transcode-all-upfront: instant playback after processing delay (minutes–hours for 4K). Progressive / just-in-time: start playback after first segments ready (~30 sec); lower storage for unpopular videos. YouTube transcodes popular resolutions first (360p + 720p), adds 4K asynchronously.

🎯 Interview Tip

Separate upload (write-heavy, async) from streaming (read-heavy, CDN). Don't transcode synchronously on upload request. L5: HLS segment structure, ABR algorithm. L6: live streaming (WebRTC/LL-HLS), copyright Content ID fingerprinting, regional compliance (geo-block metadata).

6. Uber (Ride Matching)

Match riders with nearby drivers in real time, calculate ETAs and fares, handle surge pricing, and manage trip lifecycle from request to payment. Geospatial indexing at city scale is the core challenge.

Functional requirements

  • Request ride — rider specifies pickup, dropoff, ride type
  • Match driver — find available driver within ~3 km, ETA < 5 min
  • Track trip — real-time location updates for rider and driver
  • Surge pricing — dynamic multiplier when demand exceeds supply
  • Payment & receipt — charge on trip completion
  • Rating — post-trip feedback

Scale numbers

📐 Estimation

130M MAU, 25M trips/day globally.
Trips: 25M / 86400 ≈ 290 trips/sec average; NYC peak alone ~30/sec, global peak ×5 → 1.5K/sec.
Location updates: 5M active drivers × 1 update/4 sec ≈ 1.25M location writes/sec peak.
Storage: location history 30 days × 1.25M × 86400 × 50 bytes ≈ 162 TB (tiered to cold storage).
Geospatial queries: each ride request searches ~3 km radius in <200 ms.

High-level architecture

flowchart TB
  Rider[Rider App]
  Driver[Driver App]
  GW[API Gateway]
  Trip[Trip Service\nstate machine]
  Match[Matching Service]
  Geo[(Geospatial Index\nRedis GEO / H3)]
  Surge[Surge Service]
  Loc[Location Service]
  Pay[Payment Service]
  WS[WebSocket / Push]

  Rider --> GW --> Trip
  Trip --> Match
  Match --> Geo
  Match --> Surge
  Driver -->|GPS every 4s| Loc --> Geo
  Trip --> WS
  WS --> Rider
  WS --> Driver
  Trip --> Pay

Geospatial indexing — deep dive

Approach Query Update Scale
Redis GEO GEORADIUS in O(N+log M) GEOADD O(log N) Good for city-level (<100K drivers/region)
Geohash grid Query cell + 8 neighbors Re-index on move across cell Simple; edge cases at cell boundaries
H3 hexagonal index k-ring expansion from center hex Update on hex change (~100m resolution) Uber's production choice; uniform adjacency
Quadtree / R-tree Range search O(log N) Rebalance on update PostGIS; heavier for 1M+ updates/sec

Redis GEO for hot path: GEOADD drivers:{cityId} {lng} {lat} {driverId}, GEORADIUS drivers:{cityId} {lng} {lat} 3 km COUNT 10 ASC. Partition by city/region — NYC, SF, London each own Redis cluster shard. Driver location updates every 3–4 seconds; stale drivers (>30 sec) excluded from matching.

H3 & surge pricing — deep dive

Uber open-sourced H3 — hexagonal hierarchical geospatial index. Resolution 7 hex ≈ 1.2 km²; resolution 9 ≈ 0.1 km². Surge computed per H3 cell: surge = demand / supply smoothed over 5-min window.

  1. Count ride requests per H3 cell (demand signal)
  2. Count available drivers per H3 cell (supply signal)
  3. If ratio > threshold (e.g., 2.0): surge multiplier = min(ratio, 3.0)
  4. Push surge map to rider app via WebSocket; refresh every 60 sec
  5. Drivers incentivized to move to high-surge hexes (supply rebalancing)

Trade-off Radar — Uber Matching

Trip state machine

stateDiagram-v2
  [*] --> Requested: rider requests
  Requested --> Matching: find drivers
  Matching --> Accepted: driver accepts
  Matching --> Cancelled: no driver / timeout
  Accepted --> Arriving: driver en route
  Arriving --> InProgress: rider picked up
  InProgress --> Completed: dropped off
  InProgress --> Cancelled: mid-trip cancel
  Completed --> Paid: payment processed
  Paid --> [*]
  Cancelled --> [*]

State stored in Trip Service (PostgreSQL or DynamoDB). Each transition emits domain event to Kafka: trip.accepted, trip.completed. Payment, notification, and analytics services subscribe. Idempotency keys on transitions prevent double-charge on retry.

Matching algorithm

  • Fetch top 10 nearest available drivers (GEORADIUS ASC)
  • Filter: correct ride type, rating > 4.5, not on another trip
  • Score: ETA (60%) + driver acceptance rate (20%) + direction of travel (20%)
  • Dispatch to best driver; if no accept in 15 sec, offer to next
  • Batch matching in dense areas: solve assignment problem (Hungarian algorithm) every 2 sec
⚠️ Pitfall

Don't query all drivers globally. Partition by city/geofence. At 1.25M location updates/sec, a single Redis instance dies — shard by region with consistent hashing on city_id.

📦 Real World

Uber's Dispatch service handles matching; Ramen (location store) ingests GPS at massive scale. They migrated from PostGIS to H3 + custom indexing. Surge uses ML models beyond simple supply/demand ratio.

🎯 Interview Tip

Lead with geospatial index (Redis GEO or H3), then state machine, then surge. Quantify location write QPS — it's the hidden scale driver. L6: discuss batched matching vs greedy nearest, driver supply prediction, and fraud (GPS spoofing).

7. WhatsApp (Messaging at Scale)

Design a global messaging platform delivering end-to-end encrypted messages with delivery receipts, offline support, and media sharing — at 100+ billion messages per day.

Functional requirements

  • Send message — text, image, video, voice note to individual or group (≤1024 members)
  • Delivery receipts — sent ✓, delivered ✓✓, read (blue ✓✓)
  • Offline delivery — store-and-forward when recipient offline
  • E2E encryption — server cannot read message content
  • Last seen / online status — privacy-controlled presence
  • Multi-device sync — same account on phone + desktop

Scale numbers

📐 Estimation

2B+ users, 100B messages/day (Meta reported ~2020).
Write: 100B / 86400 ≈ 1.16M messages/sec average; peak ×3 → 3.5M/sec.
Connections: ~500M concurrent WebSocket/long-poll connections globally.
Storage: 100B × 200 bytes metadata/day = 20 TB/day (content encrypted, media in object store).
Media: ~20% of messages include media; 20B × 500 KB avg = 10 PB/day (CDN + tiered storage).

High-level architecture

flowchart TB
  Sender[Sender Device]
  ConnGW[Connection Gateway\n500M WebSockets]
  MsgSvc[Message Service]
  Router[User Router\nuser_id → server]
  Cassandra[(Cassandra\nmessage store)]
  Offline[(Offline Queue\nper user partition)]
  Push[Push Notification\nAPNs / FCM]
  Receiver[Receiver Device]
  Media[(Object Storage\nencrypted blobs)]
  Signal[Signal Protocol\nE2E layer]

  Sender -->|encrypted payload| ConnGW --> MsgSvc
  MsgSvc --> Router
  Router -->|online| ConnGW --> Receiver
  Router -->|offline| Offline
  MsgSvc --> Cassandra
  Offline --> Push
  Sender --> Media
  Signal -.->|keys on device| Sender
  Signal -.->|keys on device| Receiver

Cassandra message storage — deep dive

Partition key = conversation_id (or recipient_user_id for 1:1).

Table Partition key Clustering Purpose
messages conversation_id message_id DESC Chat history pagination
offline_inbox user_id received_at ASC Pending delivery queue
user_sessions user_id device_id Multi-device routing
receipts message_id user_id Delivery/read status

WhatsApp famously uses Erlang for connection handling (soft real-time, millions of processes) and FreeBSD tuning. Message metadata in Cassandra; content is encrypted blob the server cannot decrypt. Groups: fan-out on write to each member's offline queue (similar to Twitter, but encrypted envelope per recipient).

Delivery receipts — deep dive

sequenceDiagram
  participant S as Sender
  participant Server
  participant R as Receiver

  S->>Server: send encrypted msg (msg_id)
  Server-->>S: ack sent (1 grey tick)
  Server->>R: push message
  R->>Server: ack received
  Server-->>S: delivered (2 grey ticks)
  R->>R: user opens chat
  R->>Server: ack read
  Server-->>S: read (2 blue ticks)

Receipts are lightweight control messages (~50 bytes), not full message re-transmission. Stored in Cassandra with TTL; aggregated for group chats (delivered when all members ack). Ordering: server-assigned monotonic message_id per conversation (not wall clock — avoids clock skew).

E2E encryption — Signal Protocol

  • X3DH — extended triple Diffie-Hellman for initial key agreement (first message to new contact)
  • Double Ratchet — forward secrecy: compromise of current key doesn't expose past messages
  • Sender Keys — for group messages: one encrypted payload, per-member keys for fan-out efficiency
  • Server stores only: { from, to, timestamp, encrypted_blob, msg_id } — no plaintext ever
  • Key transparency: safety numbers for verifying contact identity
⚖️ Trade-off

E2E encryption prevents server-side spam/malware scanning, search indexing, and ML moderation. WhatsApp uses metadata analysis + user reports for abuse. Multi-device sync requires Sender Key distribution across devices — significant protocol complexity.

Connection layer at 500M concurrent

  • Connection gateway shards by user_id hash — sticky routing to same Erlang node
  • Heartbeat every 30 sec; disconnect after 3 missed → mark offline, route to push
  • Long-poll fallback for restrictive networks (HTTP/2 multiplexing)
  • Cross-region: user homed in primary region; travel = cross-region relay with +50 ms latency acceptable for async chat
📦 Real World

WhatsApp scaled to 900M users with 50 engineers (2016) — extreme focus on Erlang + FreeBSD + Cassandra. Acquired by Facebook/Meta for $19B. Architecture prioritizes reliability and simplicity over feature richness. Group limit raised from 256 → 1024 required rethinking fan-out queues.

🎯 Interview Tip

L6 question — show you know the numbers (100B/day = 1.16M/sec). Separate connection layer (WebSocket scale) from storage layer (Cassandra partitions). Mention E2E as constraint that blocks server-side features. Don't forget offline queue + push notification fallback — half of messages delivered to offline users.

8. Google Maps (Navigation & Routing)

Design a mapping service: serve map tiles globally, compute routes in sub-second time across continental road networks, incorporate real-time traffic, and support turn-by-turn navigation for hundreds of millions of users.

Functional requirements

  • Map display — pan, zoom, satellite/terrain layers at 60 fps
  • Route planning — origin → destination, multiple waypoints, avoid tolls/highways
  • Turn-by-turn navigation — real-time rerouting on deviation or traffic change
  • ETA — accurate arrival time using live traffic
  • Search — places, addresses, POI autocomplete
  • Traffic overlay — color-coded congestion on map

Scale numbers

📐 Estimation

1B+ MAU, ~1B km driven/day with Google Maps navigation.
Routing requests: ~50M/day explicit + navigation reroutes → ~600/sec average, peak ×20 → 12K/sec.
Tile requests: each map view ≈ 20 tiles; 1B users × 10 views/day × 20 = 200B tiles/day2.3M tiles/sec (CDN absorbs 99%).
Road graph: global road network ≈ 200M nodes, 400M edges (~50 GB compressed).
Traffic probes: Android/iOS location at scale → billions of speed samples/hour.

High-level architecture

flowchart TB
  Client[Mobile / Web Client]
  CDN[CDN\nmap tiles]
  TileSvc[Tile Server\n256×256 PNG/vector]
  RouteAPI[Routing API]
  CH[Contraction Hierarchies\npreprocessed graph]
  Traffic[Traffic Service\nreal-time speeds]
  Graph[(Road Graph Store)]
  Probe[Probe Ingestion\nKafka]
  Search[Places Search\nElasticsearch]
  ETA[ETA ML Model]

  Client --> CDN
  Client --> RouteAPI
  CDN -->|miss| TileSvc
  RouteAPI --> CH
  RouteAPI --> Traffic
  CH --> Graph
  Traffic --> Probe
  Traffic --> ETA
  Client --> Search

Map tiles — deep dive

World divided into zoom levels 0–22. At zoom Z, earth split into 4Z tiles. Each tile: 256×256 pixels (raster PNG) or vector (Mapbox Vector Tiles / protobuf).

Zoom Coverage Tile size Cache strategy
0–5 World / continent ~5–20 KB vector CDN cache 30+ days (rarely changes)
10–14 City / neighborhood ~20–50 KB vector CDN cache 7 days; invalidate on map update
15–18 Street level ~50–200 KB CDN + client disk cache
19–22 Building detail ~200 KB–1 MB On-demand; limited pre-generation

Tile URL pattern: /tiles/{z}/{x}/{y}.pbf. CDN serves 99%+ from edge. Traffic overlay is a separate transparent tile layer refreshed every 2–5 minutes — not baked into base map.

Graph routing — deep dive

Road network = weighted directed graph: nodes = intersections, edges = road segments, weight = travel time. Naive Dijkstra on 200M nodes: O(E log V) → seconds to minutes. Unacceptable for interactive routing.

Contraction Hierarchies (CH)

Preprocess graph offline: iteratively contract least-important nodes, adding shortcut edges that preserve shortest paths. Result: hierarchy where upward search from source and downward search from target meet in middle — explored nodes drop from millions to thousands.

  • Preprocessing: hours on cluster for continent-sized graph; rerun weekly or on map update
  • Query: bidirectional search on hierarchy → <100 ms on continental graph
  • Memory: ~10–20× original graph size with shortcuts (acceptable for in-memory serving)
  • Dynamic traffic: edge weights updated from traffic service; CH structure unchanged, weights hot-reloaded
flowchart TB
  subgraph preprocess [Offline Preprocessing]
    Raw[Raw road graph\n200M nodes] --> Contract[Iterative contraction\nadd shortcuts]
    Contract --> CHGraph[CH Graph\nin memory]
  end
  subgraph query [Online Query — sub 100ms]
    Src[Source] --> Up[Upward search\nin hierarchy]
    Tgt[Target] --> Down[Downward search\nin hierarchy]
    Up --> Meet[Meet at node]
    Down --> Meet
    Meet --> Path[Unwrap shortcuts\nfull route]
  end
  CHGraph --> query
Before contraction → After (shortcut s-d added) a b c d e s t shortcut

Real-time traffic — deep dive

  1. Probe ingestion: anonymized GPS speed from Android/iOS Maps users → Kafka (billions/day)
  2. Aggregation: Flink windows compute median speed per road segment (edge) per 2-min bucket
  3. Weight update: travel_time = segment_length / max(median_speed, 5 km/h)
  4. Push to routing: in-memory edge weight table refreshed every 2–5 min
  5. ETA ML: gradient-boosted model on historical + live traffic + time-of-day + weather
⚖️ Trade-off

CH preprocessing is fast at query but slow to update graph topology (new roads). A* with landmarks (ALT) or Customizable Route Planning (CRP) trade preprocessing flexibility for query speed. Google uses CH + CRP hybrid; OSRM uses Contraction Hierarchies open-source.

Alternative algorithms (know for L6)

Algorithm Query time Preprocessing Dynamic weights
Dijkstra Seconds–minutes None Easy
A* + landmarks 100 ms–1 sec Moderate Easy
Contraction Hierarchies <100 ms Hours Weights only
Customizable Route Planning <50 ms Moderate Fast customization
📦 Real World

Google Maps routing research published Contraction Hierarchies (Geisberger et al., 2008). OpenStreetMap + OSRM provides open CH-based routing. Apple Maps uses similar preprocessed graph techniques. Map tiles served via Google CDN with vector tile format replacing raster at high zoom levels.

🏆 Senior Signal

"Tiles on CDN (2.3M RPS). Routing via Contraction Hierarchies — sub-100 ms on 200M node graph. Traffic weights from probe aggregation, refreshed every 2 min. Separate tile layer for traffic overlay. ETA uses ML on top of CH base time — cite 15–25% accuracy improvement over static routing."

🎯 Interview Tip

Split into two sub-problems: tile serving (CDN, easy) and routing (hard). Never say Dijkstra on full graph — immediately follow with CH or A* landmarks. L6: discuss CRP, multi-criteria routing (tolls + time), and rerouting frequency during navigation (every 30 sec or on deviation >50m).