Real-World Case Studies

How to use these case studies

Each case study follows the same structure: functional requirements → non-functional requirements → scale estimation → high-level architecture → deep dives → trade-offs. Switch to Interview track for tips on what interviewers probe at each level.

System design interviews rarely ask you to reproduce production systems verbatim. They test whether you can clarify requirements, estimate scale, draw a coherent architecture, and reason about failure modes. These eight problems cover the majority of pattern combinations you'll encounter: read-heavy caching, write-heavy fan-out, geospatial indexing, stream processing, distributed consensus trade-offs, and CDN delivery.

Case study	Primary patterns	Typical level	Hardest deep dive
URL Shortener	ID generation, cache-aside, async analytics	L4	301 vs 302, collision-free IDs at scale
Rate Limiter	Token bucket, Redis Lua, distributed sync	L4–L5	Sliding window accuracy vs memory
Key-Value Store	Hash table, WAL, replication, consistent hashing	L5	Quorum reads/writes, anti-entropy repair
Twitter Feed	Hybrid fan-out, sorted sets, celebrity problem	L5–L6	Fan-out on write vs read for 50M followers
YouTube	Upload pipeline, transcoding, CDN, recommendations	L5–L6	Adaptive bitrate, cold-start recommendations
Uber	Geospatial index, matching, surge, state machine	L5–L6	Real-time driver location at city scale
WhatsApp	Cassandra messaging, E2E encryption, receipts	L6	100B msgs/day partition strategy
Google Maps	Graph routing, tile CDN, real-time traffic	L6	Contraction hierarchies, sub-second routing

🎯 Interview Tip

Pick 3–4 case studies and practice the full 45-minute RADIO flow on each: Requirements (5 min), Architecture (10 min), Data model (5 min), Interface/API (5 min), Optimizations & deep dives (15 min), Bottlenecks (5 min). Interviewers reward structured thinking over perfect diagrams.

1. URL Shortener (bit.ly)

The canonical L4 interview question. Shorten long URLs into compact aliases, resolve them at read time with sub-10 ms latency, and optionally track click analytics without blocking the redirect path.

Functional requirements

Create short URL — given a long URL, return a unique short code (custom alias optional for premium users)
Redirect — GET /{shortCode} resolves to original URL via HTTP redirect
Analytics (optional) — track clicks: timestamp, referrer, geo, user-agent
Expiration — TTL on links; soft-delete vs hard-delete policy
Custom aliases — user-chosen codes with uniqueness check

Non-functional requirements

Requirement	Target	Why it matters
Read latency	p99 < 10 ms	Redirects are user-facing; every ms hurts conversion
Availability	99.99% on read path	Broken links damage brand trust instantly
Write throughput	~1K–10K new URLs/sec peak	Lower than reads but must never block redirects
Read/write ratio	100:1 typical	Optimize aggressively for read path
URL uniqueness	Global, collision-free	Duplicate codes = wrong redirects = security incident

Scale estimation

📐 Estimation

Assumptions: 500M DAU, 2 link creations + 20 redirects per user per day.
Writes: 500M × 2 / 86400 ≈ 12K writes/sec (peak ×3 → 36K/sec).
Reads: 500M × 20 / 86400 ≈ 116K reads/sec (peak ×3 → 350K RPS).
Storage (10 years): 500M × 2 × 365 × 10 = 3.65T URLs × ~500 bytes = ~1.8 PB (plan tiered archival).
Bandwidth: 350K RPS × 500 bytes redirect response ≈ 175 MB/s (negligible vs CDN).

High-level architecture

flowchart TB
  Client[Client / Browser]
  LB[Load Balancer]
  API[Redirect Service]
  Create[Create Service]
  Redis[(Redis Cache\nshortCode → longURL)]
  MySQL[(MySQL\nsource of truth)]
  Kafka[Kafka\nclick events]
  Analytics[Analytics Pipeline\nFlink → ClickHouse]

  Client -->|GET /abc123| LB --> API
  API --> Redis
  Redis -->|miss| MySQL
  API -->|async| Kafka
  Client -->|POST /shorten| LB --> Create
  Create --> MySQL
  Create --> Redis
  Kafka --> Analytics

ID generation strategies

Strategy	Example	Pros	Cons
Auto-increment + Base62	`123456789 → "8M0kX"`	Simple, dense, no collision check	Single DB writer bottleneck; predictable/enumerable IDs
Snowflake / UUID	64-bit time-ordered ID	Distributed, no DB coordination	Longer codes; need encoding scheme
Hash of URL + counter	MD5 prefix + retry on collision	Same long URL → same short URL (dedup)	Collision handling complexity at scale
Pre-generated ID pool	Range allocator per server	Zero contention at write time	Pool exhaustion, range management ops

L5 Production choice for bit.ly-style systems: Snowflake IDs encoded in Base62 for distributed creation, with MySQL as source of truth and Redis cache-aside on read. Base62 uses [a-zA-Z0-9] — 7 chars = 62⁷ ≈ 3.5 trillion unique codes.

# Base62 encode — 7 chars covers 3.5T IDs
ALPHABET = "0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ"

def encode_base62(num: int) -> str:
    if num == 0:
        return ALPHABET[0]
    chars = []
    while num:
        num, rem = divmod(num, 62)
        chars.append(ALPHABET[rem])
    return "".join(reversed(chars))

# Snowflake: 41-bit timestamp | 10-bit machine | 12-bit sequence
# → ~4096 IDs/ms per machine, sortable, no DB round-trip

301 vs 302 redirect — deep dive

This is the most common L4 trap question. The choice affects SEO, analytics accuracy, and cache behavior.

Status	Browser behavior	Use when
301 Moved Permanently	Browser caches redirect; may skip shortener on repeat visits	Permanent links, SEO consolidation, minimal server load after first hit
302 Found (Temporary)	Browser always hits shortener first	Analytics on every click, A/B destination testing, expiring links
307 Temporary	Preserves HTTP method (POST stays POST)	API redirects, non-GET flows

⚖️ Trade-off

301 reduces server load (browser caches) but under-counts analytics after first visit. 302 guarantees every click is counted but doubles latency on repeat visits. bit.ly uses 302 for marketing/analytics products; internal/permanent links may use 301. Say both options in interviews and tie to product requirements.

MySQL + Redis read path

GET shortCode from Redis (short:abc123 → longURL)
On miss: query MySQL by primary key (short_code indexed), populate Redis with TTL (24h–7d)
Return 302 redirect; fire-and-forget click event to Kafka
On create: INSERT MySQL, SET Redis, return short URL

MySQL schema: short_code VARCHAR(10) PK, long_url TEXT, user_id, created_at, expires_at, click_count BIGINT (approximate via periodic flush from analytics). Shard MySQL by hash(short_code) when write QPS exceeds single-node capacity (~10K/sec sustained writes).

Analytics via Kafka

sequenceDiagram
  participant Redirect as Redirect Service
  participant Kafka as Kafka
  participant Flink as Flink / Spark
  participant CH as ClickHouse
  participant Dash as Analytics Dashboard

  Redirect->>Kafka: click event (async, non-blocking)
  Note over Redirect: never wait for analytics
  Kafka->>Flink: stream aggregate
  Flink->>CH: hourly rollups
  CH->>Dash: query p99 < 2s

Click events: { short_code, timestamp, ip, user_agent, referrer, geo }. Partition Kafka by short_code for ordered per-link aggregation. Never block redirect on analytics write — if Kafka is down, sample/drop events rather than fail redirects.

⚠️ Pitfall

Don't put analytics synchronously on the redirect path. At 350K RPS, even 1 ms extra Kafka latency adds 350 seconds of aggregate wait per second. Use async fire-and-forget with local buffer + backpressure drop.

📦 Real World

bit.ly processes billions of redirects monthly. They use multi-region Redis clusters for hot codes, MySQL/Aurora for persistence, and real-time analytics pipelines. TinyURL early architecture was single MySQL + mod-hash — worked until read QPS forced caching layer.

🏆 Senior Signal

"Read path: Redis cache-aside, 302 for analytics, async Kafka click stream. Write path: Snowflake ID + Base62, MySQL INSERT, cache warm. Scale reads with Redis Cluster; scale writes with DB sharding by short_code hash. 350K peak RPS needs ~4 Redis nodes at 100K ops each with 80% hit rate."

🎯 Interview Tip

L4 Must mention: Base62/Snowflake ID, Redis cache, 301 vs 302 trade-off, read-heavy optimization.
L5 Add: Kafka analytics decoupling, sharding strategy, cache stampede mutex on miss.
L6 Discuss: multi-region active-active, custom alias race conditions (DB unique constraint + Redis SETNX), abuse detection (malware URL scanning async on create).

2. Rate Limiter

Protect APIs and infrastructure from abuse, ensure fair usage across tenants, and enforce SLA tiers. Rate limiting appears in every gateway, every public API, and almost every L4–L5 interview.

Functional requirements

Limit requests per user, IP, API key, or endpoint
Return 429 Too Many Requests with Retry-After header when exceeded
Support multiple tiers: free (100/min), pro (10K/min), enterprise (custom)
Configurable limits per route (POST /upload stricter than GET /health)
Whitelist/blacklist bypass rules

Non-functional requirements & scale

Metric	Target
Decision latency	< 1 ms p99 (on critical path)
Throughput	Match API gateway RPS (100K–1M+ RPS)
Accuracy	Within 1% of configured limit (sliding window)
Availability	Fail-open vs fail-closed policy explicit
Memory	~100 bytes per active key (user/IP combo)

📐 Estimation

1M RPS API, 10M unique users/day active → ~500K concurrent rate-limit keys in Redis. 500K × 100 bytes ≈ 50 MB — trivial for Redis Cluster. At 1M RPS, need ~10 Redis nodes (100K ops each) or local token bucket + Redis sync for global limits.

Algorithm comparison

Algorithm	Behavior	Burst	Memory	Accuracy
Fixed window	Count requests per clock minute	2× burst at window boundary	O(1) per key	Poor at edges
Sliding window log	Store timestamp of each request	None	O(limit) per key	Exact
Sliding window counter	Weighted blend of current + previous window	Minimal	O(1) per key	~99% accurate
Token bucket	Tokens refill at fixed rate; burst up to bucket size	Configurable burst	O(1) per key	Good
Leaky bucket	Requests queue; processed at fixed rate	Smooths traffic	O(queue depth)	Strict output rate

flowchart LR
  subgraph tb [Token Bucket]
    Refill[Refill 10/sec] --> Bucket[(Bucket\ncapacity 100)]
    Req1[Request] -->|take 1 token| Bucket
    Bucket -->|tokens > 0| Allow[Allow]
    Bucket -->|empty| Deny429[429 Deny]
  end
  subgraph lb [Leaky Bucket]
    In[Incoming burst] --> Queue[(Queue)]
    Queue -->|leak 10/sec| Out[Steady output]
  end

Token bucket — deep dive

Each client has a bucket with capacity B and refill rate R tokens/sec. On request: if tokens ≥ 1, decrement and allow; else reject with 429. Allows bursts up to B while maintaining average rate R over time. Used by AWS API Gateway, Stripe, and most SaaS APIs.

-- Redis Lua: atomic token bucket (EVAL script)
local key = KEYS[1]
local capacity = tonumber(ARGV[1])
local rate = tonumber(ARGV[2])
local now = tonumber(ARGV[3])
local requested = tonumber(ARGV[4])

local data = redis.call('HMGET', key, 'tokens', 'last_refill')
local tokens = tonumber(data[1]) or capacity
local last = tonumber(data[2]) or now

local elapsed = math.max(0, now - last)
tokens = math.min(capacity, tokens + elapsed * rate)

if tokens >= requested then
  tokens = tokens - requested
  redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
  redis.call('EXPIRE', key, math.ceil(capacity / rate) + 1)
  return 1  -- allowed
else
  return 0  -- denied
end

Sliding window — deep dive

Fixed window allows 2× burst at minute boundaries (59 requests at 00:59 + 60 at 01:00 = 119 in 2 seconds). Sliding window counter fixes this with O(1) memory:

count = prev_window_count × (1 - elapsed/window) + curr_window_count

Store two counters per key: current and previous window. Redis key: ratelimit:{user}:{window_id}. Accuracy ~99%; used by Cloudflare and Kong gateway at scale.

Distributed rate limiting

flowchart TB
  Client --> Edge[Edge / CDN\n coarse IP limits]
  Edge --> GW[API Gateway\n per-key limits]
  GW --> Local[Local token bucket\n 80% decisions]
  Local -->|sync every 100ms| Redis[(Redis Cluster\n global truth)]
  GW --> Service[Microservice]
  Service -->|optional 2nd check| Redis

Centralized (Redis) — accurate global limits; adds ~0.5–1 ms latency; single cluster is bottleneck
Local + sync — each node holds local bucket, syncs quota to Redis periodically; faster but briefly over-limit during sync
Edge limiting — Cloudflare/Kong at PoP; stops abuse before origin; coarse granularity (IP/CIDR)
Consistent hashing — route same user to same Redis shard; avoids cross-shard coordination

⚖️ Trade-off

Fail-open (allow on Redis down): prioritizes availability; risk of abuse during outage. Fail-closed (deny all): protects infrastructure; causes user-facing outage. Stripe fails closed on payment endpoints, fail-open on read-only catalog. State your policy explicitly.

⚠️ Pitfall

Rate limiting by IP alone fails behind NAT (corporate offices, mobile carriers). Combine IP + API key + user ID. For anonymous endpoints, use IP + fingerprint + progressive CAPTCHA escalation.

📦 Real World

Stripe publishes rate limits per API key with Retry-After headers. GitHub uses sliding window with response headers: X-RateLimit-Remaining, X-RateLimit-Reset. AWS API Gateway uses token bucket at account and stage level.

🎯 Interview Tip

Draw three boxes: client → gateway (local bucket) → Redis (global sync). Explain token bucket for burst tolerance. Mention Redis Lua for atomicity — without it, race conditions double-count under concurrency. L6: discuss race between limiter and billing meter, clock skew across nodes, and adaptive limits under DDoS.

3. Key-Value Store (Dynamo-style)

Design a distributed, highly available key-value store like DynamoDB, Riak, or the storage layer behind Redis Cluster. Tests understanding of partitioning, replication, consistency trade-offs, and failure recovery.

Functional requirements

put(key, value) — store arbitrary blob (≤ 1 MB per value typical)
get(key) — retrieve latest value or not-found
delete(key) — tombstone for eventual removal
Optional: conditional writes (put if not exists), TTL expiration
No complex queries — point lookups only (by design)

Non-functional requirements

Requirement	Target
Availability	99.99% — always accept writes during partition
Latency	GET p99 < 10 ms; PUT p99 < 20 ms
Throughput	1M+ ops/sec cluster-wide
Durability	No acknowledged write loss on single-node failure
Scalability	Linear scale-out by adding nodes

📐 Estimation

1B keys × 1 KB average = 1 TB raw data. Replication factor 3 → 3 TB cluster. 100K ops/sec per node × 30 nodes = 3M ops/sec cluster capacity. WAL write: 100K × 1 KB = 100 MB/s disk per node — NVMe handles easily.

High-level architecture

flowchart TB
  Client[Client SDK]
  Coord[Coordinator / Router]
  subgraph ring [Consistent Hash Ring]
    N1[Node A\nvnodes 0-85]
    N2[Node B\nvnodes 86-170]
    N3[Node C\nvnodes 171-255]
  end
  subgraph nodeA [Each Storage Node]
    HT[(In-memory\nHash Table)]
    WAL[(Write-Ahead Log)]
    SST[(SSTable / Disk)]
    HT --> WAL --> SST
  end
  Client --> Coord --> ring
  N1 --> nodeA

In-memory hash table — deep dive

Hot data lives in an in-memory hash map (open addressing or chained buckets). Each entry: { key, value, version, expiry }. Memory budget: if node has 64 GB RAM, reserve ~48 GB for hash table, rest for buffers/OS. Eviction: LRU for cache tier; for durable store, flush cold entries to SSTable on disk.

Write-Ahead Log (WAL) — deep dive

Every write appends to WAL before acknowledging the client. On crash recovery: replay WAL to rebuild in-memory state. WAL segment rotation: new file every 64–128 MB; old segments deleted after SSTable flush.

Client sends PUT
Append (key, value, seq, timestamp) to WAL (fsync or group commit batch)
Update in-memory hash table
Replicate to N-1 nodes (async or sync per consistency level)
Return success to client

⚖️ Trade-off

fsync every write: durable but ~1K writes/sec on HDD, ~100K on NVMe. Group commit (batch fsync every 1 ms): 10× throughput, lose up to 1 ms of writes on crash. Dynamo-style systems use group commit + replication for durability without single-disk fsync bottleneck.

Replication & quorum

Each key maps to a preference list of N nodes (typically N=3) via consistent hashing. Coordinator sends write to all N replicas; read from R replicas.

Config	W + R	Behavior
Strong	W=2, R=2 (N=3)	Quorum overlap guarantees latest read; higher latency
Eventual	W=1, R=1	Fastest; may read stale; repair via read-repair
Default Dynamo	W=2, R=1 or W=1, R=2	Tunable per client request

sequenceDiagram
  participant C as Client
  participant Co as Coordinator
  participant R1 as Replica 1
  participant R2 as Replica 2
  participant R3 as Replica 3

  C->>Co: PUT key (W=2)
  Co->>R1: write v5
  Co->>R2: write v5
  Co->>R3: write v5 (async)
  R1-->>Co: ack
  R2-->>Co: ack
  Co-->>C: success (2 of 3)

  C->>Co: GET key (R=2)
  Co->>R1: read
  Co->>R2: read
  Co-->>C: return max version

Consistent hashing — deep dive

Place nodes and keys on a hash ring (0 to 2³²-1). Key maps to first node clockwise on ring. Virtual nodes: each physical node owns 100–256 vnodes for even load distribution. Adding a node: only adjacent key ranges migrate (~K/N keys).

Failure handling

Hinted handoff — if preferred node down, write to next node on ring; return when original recovers
Read repair — on read, if replicas disagree, write latest version to stale replicas
Anti-entropy — background Merkle tree comparison between replicas; sync divergent keys
Gossip membership — nodes discover failures via SWIM protocol; update ring state

🏆 Senior Signal

"AP system by default: W=1 for low-latency writes, R=2 for read-your-writes when needed. Vector clocks resolve conflicts on concurrent writes to same key — client merges or LWW with timestamp. This is exactly the Dynamo paper trade-off space."

🎯 Interview Tip

L5+: draw the hash ring with vnodes. Explain W/R/N quorum math: W + R > N guarantees overlap. L6: compare to Raft (strong consistency, leader bottleneck) vs Dynamo (always writable, eventual consistency). Mention vector clocks vs last-write-wins — LWW loses updates silently.

4. Twitter Feed (Home Timeline)

Design the home timeline: when a user opens Twitter/X, show recent tweets from people they follow— sorted, paginated, and fresh within seconds. The celebrity problem makes this one of the hardest L5–L6 questions.

Functional requirements

Post tweet — user publishes ≤ 280 chars (+ media metadata)
Home timeline — paginated feed of tweets from followed users, newest first
Follow / unfollow — update social graph
Search & trends — trending topics (optional scope extension)
Media — images/video stored separately; timeline holds references

Scale numbers

📐 Estimation

300M DAU, 2 tweets + 50 timeline reads per user/day.
Write: 300M × 2 / 86400 ≈ 7K tweets/sec (peak ×3 → 21K/sec).
Read: 300M × 50 / 86400 ≈ 174K timeline reads/sec (peak → 520K/sec).
Storage: 7K/sec × 86400 × 365 × 500 bytes ≈ 110 TB/year tweets alone.
Fan-out: avg 200 followers × 7K tweets = 1.4M timeline writes/sec if pure fan-out-on-write.

Fan-out on write vs fan-out on read

Strategy	Write cost	Read cost	Best for
Fan-out on write	O(followers) per tweet	O(1) — precomputed timeline	Normal users (<10K followers)
Fan-out on read	O(1) — store tweet once	O(following) — merge at read	Celebrities (>1M followers)
Hybrid	O(min(followers, 10K))	O(1) + merge celebrity tweets	Production Twitter approach

flowchart TB
  Post[User posts tweet]
  Post --> Check{Followers > 10K?}
  Check -->|No| FOW[Fan-out on write\npush to follower timelines]
  Check -->|Yes| Celeb[Store in celebrity bucket\nno fan-out]
  FOW --> Redis[(Redis Sorted Set\ntimeline:userId)]
  Celeb --> TweetStore[(Cassandra\ntweets by user)]
  Read[User reads timeline] --> Redis
  Read --> Merge[Merge celebrity tweets\nfrom Cassandra]
  Merge --> Response[Sorted merged feed]

Redis sorted sets — deep dive

Each user's home timeline is a Redis sorted set (ZSET): score = tweet timestamp (or snowflake ID), member = tweet_id. ZADD timeline:{userId} {timestamp} {tweetId} ZRANGE timeline:{userId} 0 49 REV → latest 50 tweets in O(log N + 50).

Cap timeline at ~800 entries; trim oldest on insert (sliding window cache)
800 × 8 bytes tweet_id × 300M users = ~192 GB if every user had full cache — most inactive; use tiered storage
Hot users' timelines in Redis; cold users fan-out-on-read from Cassandra

Cassandra for tweet storage

Partition key design drives everything:

Table	Partition key	Clustering	Query
`tweets_by_user`	`user_id`	`tweet_id DESC`	User profile timeline
`timeline_cache`	`viewer_id`	`tweet_id DESC`	Fallback precomputed timeline
`social_graph`	`follower_id`	`followee_id`	Who does user follow?

The celebrity problem — deep dive

Katy Perry (~100M+ followers): one tweet = 100M Redis ZADD operations. At 1 ms each (absurdly optimistic), that's 27+ hours. Even at 100K ops/sec cluster-wide, one celebrity tweet consumes the entire cluster for 1000 seconds.

Solution — hybrid fan-out:

Users with <10K followers: fan-out on write to Redis timelines
Celebrities (>10K): tweet stored only in their Cassandra partition
On timeline read: fetch precomputed timeline + fetch recent tweets from each followed celebrity + merge sort
Cache merged celebrity slice separately (5–30 sec TTL)

Trends — HyperLogLog

Counting unique users per hashtag at Twitter scale is impossible with exact counters. HyperLogLog in Redis: ~12 KB memory per key, ~0.81% error, counts billions of uniques. Pipeline: tweet → Kafka → Flink extracts hashtags → PFADD trend:{hashtag} {userId} → periodic PFCOUNT for ranking. Top-K via Count-Min Sketch + heap.

📦 Real World

Twitter's 2012 architecture blog described Redis timelines + MySQL/Cassandra + hybrid fan-out. They migrated to Manhattan (internal KV store) for timelines. The hybrid fan-out pattern remains the standard answer — Instagram uses similar approach for Stories feed.

🎯 Interview Tip

Always compute fan-out cost: 7K tweets × 200 avg followers = 1.4M writes/sec — say this number out loud. Then explain hybrid. L6: discuss ranking (ML score vs chronological), ads insertion, and read-your-own-writes consistency when you tweet and immediately refresh feed.

5. YouTube (Video Platform)

Design a video upload, processing, and streaming platform serving billions of hours watched monthly. Spans blob storage, async transcoding pipelines, CDN delivery, and recommendation systems.

Functional requirements

Upload video — resumable upload, metadata (title, description, tags)
Transcode — multiple resolutions (360p–4K) and formats (H.264, VP9, AV1)
Stream — adaptive bitrate playback with low startup latency (<2 sec)
Search & recommendations — home feed, related videos
Analytics — view counts, watch time, creator dashboard

Scale numbers

📐 Estimation

2B logged-in users/month, 500M hours watched/day, avg 720p @ 2 Mbps.
Uploads: ~500 hours/minute uploaded → ~8 uploads/sec average, peak ×10 → 80/sec.
Views: billions/day → ~100K–500K stream starts/sec peak globally.
Storage: 500 hr/min × 60 × 24 × 1 GB/hr (raw) ≈ 720 TB/day raw; transcoded 3–5× → 2–3 PB/day (most cold/archived).
CDN egress: 500M hr × 2 Mbps ≈ 278 Tbps daily average — CDN is non-negotiable.

Upload pipeline

flowchart LR
  Creator[Creator App]
  Upload[Upload Service\nresumable chunks]
  Blob[(Object Storage\nS3 / GCS)]
  Meta[(Metadata DB\nvideo_id, status)]
  Queue[SQS / Pub-Sub]
  Transcode[Transcoding Farm\nFFmpeg workers]
  CDN[CDN Origin]
  Catalog[Catalog Service]

  Creator -->|chunked PUT| Upload
  Upload --> Blob
  Upload --> Meta
  Upload -->|video.uploaded| Queue
  Queue --> Transcode
  Transcode -->|read raw| Blob
  Transcode -->|write HLS segments| CDN
  Transcode --> Meta
  Meta --> Catalog

Resumable upload — deep dive

Client requests upload session → server returns upload_id + signed chunk URLs
Upload 5–50 MB chunks with Content-Range headers
Server tracks received byte ranges in Redis/DB
On disconnect: client resumes from last confirmed byte
Complete: assemble manifest, trigger transcoding job

YouTube uses resumable HTTP uploads (similar to Google Cloud Storage protocol). Never hold full video in app server memory.

Transcoding with FFmpeg — deep dive

Resolution	Bitrate	Transcode time (1 hr video)	Segment size
360p	800 Kbps	~15 min (1× realtime)	2–6 sec HLS segments
720p	2.5 Mbps	~30 min	2–6 sec
1080p	5 Mbps	~45 min	2–6 sec
4K	15 Mbps	~2 hr (GPU accelerated)	2–6 sec

Transcoding farm: Kubernetes jobs, one FFmpeg process per job, GPU nodes for 4K/AV1. Priority queue: popular creators / live replay first. Spot/preemptible instances for 70% cost savings. Output: HLS (.m3u8 manifest + .ts segments) and DASH (.mpd + .m4s) for browser compatibility.

# FFmpeg: generate HLS ladder from source
ffmpeg -i input.mp4 \
  -filter_complex "[0:v]split=3[v1][v2][v3]; \
    [v1]scale=640:360[v1out]; [v2]scale=1280:720[v2out]; [v3]scale=1920:1080[v3out]" \
  -map "[v1out]" -c:v libx264 -b:v 800k -map a:0 -c:a aac -b:a 96k -f hls stream_360.m3u8 \
  -map "[v2out]" -c:v libx264 -b:v 2500k -map a:0 -c:a aac -b:a 128k -f hls stream_720.m3u8 \
  -map "[v3out]" -c:v libx264 -b:v 5000k -map a:0 -c:a aac -b:a 192k -f hls stream_1080.m3u8

CDN & HLS/DASH delivery

sequenceDiagram
  participant Player
  participant CDN as CDN Edge
  participant Origin as Origin / Storage

  Player->>CDN: GET master.m3u8
  CDN-->>Player: 360p/720p/1080p variants
  Player->>Player: measure bandwidth
  Player->>CDN: GET 720p segment 001.ts
  CDN-->>Player: video chunk
  Note over Player: switch to 1080p if bandwidth allows
  Player->>CDN: GET 1080p segment 002.ts
  CDN->>Origin: cache miss fetch
  Origin-->>CDN: segment
  CDN-->>Player: video chunk

Adaptive bitrate (ABR): player monitors buffer and bandwidth, switches quality per segment. CDN caching: popular videos cached at edge; long-tail served from origin. Cache key: video_id + resolution + segment_index. Immutable segments = infinite TTL.

Recommendations — deep dive

Candidate generation — collaborative filtering, content similarity, subscribed channels (millions → hundreds)
Ranking — ML model scores candidates on watch probability, satisfaction, diversity
Two-tower model — user embedding × video embedding dot product for fast retrieval at scale
Feature store — real-time features (last 10 watches) + batch features (user demographics)
Freshness — new uploads indexed within minutes via stream processing

⚖️ Trade-off

Transcode-all-upfront: instant playback after processing delay (minutes–hours for 4K). Progressive / just-in-time: start playback after first segments ready (~30 sec); lower storage for unpopular videos. YouTube transcodes popular resolutions first (360p + 720p), adds 4K asynchronously.

🎯 Interview Tip

Separate upload (write-heavy, async) from streaming (read-heavy, CDN). Don't transcode synchronously on upload request. L5: HLS segment structure, ABR algorithm. L6: live streaming (WebRTC/LL-HLS), copyright Content ID fingerprinting, regional compliance (geo-block metadata).

6. Uber (Ride Matching)

Match riders with nearby drivers in real time, calculate ETAs and fares, handle surge pricing, and manage trip lifecycle from request to payment. Geospatial indexing at city scale is the core challenge.

Functional requirements

Request ride — rider specifies pickup, dropoff, ride type
Match driver — find available driver within ~3 km, ETA < 5 min
Track trip — real-time location updates for rider and driver
Surge pricing — dynamic multiplier when demand exceeds supply
Payment & receipt — charge on trip completion
Rating — post-trip feedback

Scale numbers

📐 Estimation

130M MAU, 25M trips/day globally.
Trips: 25M / 86400 ≈ 290 trips/sec average; NYC peak alone ~30/sec, global peak ×5 → 1.5K/sec.
Location updates: 5M active drivers × 1 update/4 sec ≈ 1.25M location writes/sec peak.
Storage: location history 30 days × 1.25M × 86400 × 50 bytes ≈ 162 TB (tiered to cold storage).
Geospatial queries: each ride request searches ~3 km radius in <200 ms.

High-level architecture

flowchart TB
  Rider[Rider App]
  Driver[Driver App]
  GW[API Gateway]
  Trip[Trip Service\nstate machine]
  Match[Matching Service]
  Geo[(Geospatial Index\nRedis GEO / H3)]
  Surge[Surge Service]
  Loc[Location Service]
  Pay[Payment Service]
  WS[WebSocket / Push]

  Rider --> GW --> Trip
  Trip --> Match
  Match --> Geo
  Match --> Surge
  Driver -->|GPS every 4s| Loc --> Geo
  Trip --> WS
  WS --> Rider
  WS --> Driver
  Trip --> Pay

Geospatial indexing — deep dive

Approach	Query	Update	Scale
Redis GEO	GEORADIUS in O(N+log M)	GEOADD O(log N)	Good for city-level (<100K drivers/region)
Geohash grid	Query cell + 8 neighbors	Re-index on move across cell	Simple; edge cases at cell boundaries
H3 hexagonal index	k-ring expansion from center hex	Update on hex change (~100m resolution)	Uber's production choice; uniform adjacency
Quadtree / R-tree	Range search O(log N)	Rebalance on update	PostGIS; heavier for 1M+ updates/sec

Redis GEO for hot path: GEOADD drivers:{cityId} {lng} {lat} {driverId}, GEORADIUS drivers:{cityId} {lng} {lat} 3 km COUNT 10 ASC. Partition by city/region — NYC, SF, London each own Redis cluster shard. Driver location updates every 3–4 seconds; stale drivers (>30 sec) excluded from matching.

H3 & surge pricing — deep dive

Uber open-sourced H3 — hexagonal hierarchical geospatial index. Resolution 7 hex ≈ 1.2 km²; resolution 9 ≈ 0.1 km². Surge computed per H3 cell: surge = demand / supply smoothed over 5-min window.

Count ride requests per H3 cell (demand signal)
Count available drivers per H3 cell (supply signal)
If ratio > threshold (e.g., 2.0): surge multiplier = min(ratio, 3.0)
Push surge map to rider app via WebSocket; refresh every 60 sec
Drivers incentivized to move to high-surge hexes (supply rebalancing)

Trip state machine

stateDiagram-v2
  [*] --> Requested: rider requests
  Requested --> Matching: find drivers
  Matching --> Accepted: driver accepts
  Matching --> Cancelled: no driver / timeout
  Accepted --> Arriving: driver en route
  Arriving --> InProgress: rider picked up
  InProgress --> Completed: dropped off
  InProgress --> Cancelled: mid-trip cancel
  Completed --> Paid: payment processed
  Paid --> [*]
  Cancelled --> [*]

State stored in Trip Service (PostgreSQL or DynamoDB). Each transition emits domain event to Kafka: trip.accepted, trip.completed. Payment, notification, and analytics services subscribe. Idempotency keys on transitions prevent double-charge on retry.

Matching algorithm

Fetch top 10 nearest available drivers (GEORADIUS ASC)
Filter: correct ride type, rating > 4.5, not on another trip
Score: ETA (60%) + driver acceptance rate (20%) + direction of travel (20%)
Dispatch to best driver; if no accept in 15 sec, offer to next
Batch matching in dense areas: solve assignment problem (Hungarian algorithm) every 2 sec

⚠️ Pitfall

Don't query all drivers globally. Partition by city/geofence. At 1.25M location updates/sec, a single Redis instance dies — shard by region with consistent hashing on city_id.

📦 Real World

Uber's Dispatch service handles matching; Ramen (location store) ingests GPS at massive scale. They migrated from PostGIS to H3 + custom indexing. Surge uses ML models beyond simple supply/demand ratio.

🎯 Interview Tip

Lead with geospatial index (Redis GEO or H3), then state machine, then surge. Quantify location write QPS — it's the hidden scale driver. L6: discuss batched matching vs greedy nearest, driver supply prediction, and fraud (GPS spoofing).

7. WhatsApp (Messaging at Scale)

Design a global messaging platform delivering end-to-end encrypted messages with delivery receipts, offline support, and media sharing — at 100+ billion messages per day.

Functional requirements

Send message — text, image, video, voice note to individual or group (≤1024 members)
Delivery receipts — sent ✓, delivered ✓✓, read (blue ✓✓)
Offline delivery — store-and-forward when recipient offline
E2E encryption — server cannot read message content
Last seen / online status — privacy-controlled presence
Multi-device sync — same account on phone + desktop

Scale numbers

📐 Estimation

2B+ users, 100B messages/day (Meta reported ~2020).
Write: 100B / 86400 ≈ 1.16M messages/sec average; peak ×3 → 3.5M/sec.
Connections: ~500M concurrent WebSocket/long-poll connections globally.
Storage: 100B × 200 bytes metadata/day = 20 TB/day (content encrypted, media in object store).
Media: ~20% of messages include media; 20B × 500 KB avg = 10 PB/day (CDN + tiered storage).

High-level architecture

flowchart TB
  Sender[Sender Device]
  ConnGW[Connection Gateway\n500M WebSockets]
  MsgSvc[Message Service]
  Router[User Router\nuser_id → server]
  Cassandra[(Cassandra\nmessage store)]
  Offline[(Offline Queue\nper user partition)]
  Push[Push Notification\nAPNs / FCM]
  Receiver[Receiver Device]
  Media[(Object Storage\nencrypted blobs)]
  Signal[Signal Protocol\nE2E layer]

  Sender -->|encrypted payload| ConnGW --> MsgSvc
  MsgSvc --> Router
  Router -->|online| ConnGW --> Receiver
  Router -->|offline| Offline
  MsgSvc --> Cassandra
  Offline --> Push
  Sender --> Media
  Signal -.->|keys on device| Sender
  Signal -.->|keys on device| Receiver

Cassandra message storage — deep dive

Partition key = conversation_id (or recipient_user_id for 1:1).

Table	Partition key	Clustering	Purpose
`messages`	`conversation_id`	`message_id DESC`	Chat history pagination
`offline_inbox`	`user_id`	`received_at ASC`	Pending delivery queue
`user_sessions`	`user_id`	`device_id`	Multi-device routing
`receipts`	`message_id`	`user_id`	Delivery/read status

WhatsApp famously uses Erlang for connection handling (soft real-time, millions of processes) and FreeBSD tuning. Message metadata in Cassandra; content is encrypted blob the server cannot decrypt. Groups: fan-out on write to each member's offline queue (similar to Twitter, but encrypted envelope per recipient).

Delivery receipts — deep dive

sequenceDiagram
  participant S as Sender
  participant Server
  participant R as Receiver

  S->>Server: send encrypted msg (msg_id)
  Server-->>S: ack sent (1 grey tick)
  Server->>R: push message
  R->>Server: ack received
  Server-->>S: delivered (2 grey ticks)
  R->>R: user opens chat
  R->>Server: ack read
  Server-->>S: read (2 blue ticks)

Receipts are lightweight control messages (~50 bytes), not full message re-transmission. Stored in Cassandra with TTL; aggregated for group chats (delivered when all members ack). Ordering: server-assigned monotonic message_id per conversation (not wall clock — avoids clock skew).

E2E encryption — Signal Protocol

X3DH — extended triple Diffie-Hellman for initial key agreement (first message to new contact)
Double Ratchet — forward secrecy: compromise of current key doesn't expose past messages
Sender Keys — for group messages: one encrypted payload, per-member keys for fan-out efficiency
Server stores only: { from, to, timestamp, encrypted_blob, msg_id } — no plaintext ever
Key transparency: safety numbers for verifying contact identity

⚖️ Trade-off

E2E encryption prevents server-side spam/malware scanning, search indexing, and ML moderation. WhatsApp uses metadata analysis + user reports for abuse. Multi-device sync requires Sender Key distribution across devices — significant protocol complexity.

Connection layer at 500M concurrent

Connection gateway shards by user_id hash — sticky routing to same Erlang node
Heartbeat every 30 sec; disconnect after 3 missed → mark offline, route to push
Long-poll fallback for restrictive networks (HTTP/2 multiplexing)
Cross-region: user homed in primary region; travel = cross-region relay with +50 ms latency acceptable for async chat

📦 Real World

WhatsApp scaled to 900M users with 50 engineers (2016) — extreme focus on Erlang + FreeBSD + Cassandra. Acquired by Facebook/Meta for $19B. Architecture prioritizes reliability and simplicity over feature richness. Group limit raised from 256 → 1024 required rethinking fan-out queues.

🎯 Interview Tip

L6 question — show you know the numbers (100B/day = 1.16M/sec). Separate connection layer (WebSocket scale) from storage layer (Cassandra partitions). Mention E2E as constraint that blocks server-side features. Don't forget offline queue + push notification fallback — half of messages delivered to offline users.

8. Google Maps (Navigation & Routing)

Design a mapping service: serve map tiles globally, compute routes in sub-second time across continental road networks, incorporate real-time traffic, and support turn-by-turn navigation for hundreds of millions of users.

Functional requirements

Map display — pan, zoom, satellite/terrain layers at 60 fps
Route planning — origin → destination, multiple waypoints, avoid tolls/highways
Turn-by-turn navigation — real-time rerouting on deviation or traffic change
ETA — accurate arrival time using live traffic
Search — places, addresses, POI autocomplete
Traffic overlay — color-coded congestion on map

Scale numbers

📐 Estimation

1B+ MAU, ~1B km driven/day with Google Maps navigation.
Routing requests: ~50M/day explicit + navigation reroutes → ~600/sec average, peak ×20 → 12K/sec.
Tile requests: each map view ≈ 20 tiles; 1B users × 10 views/day × 20 = 200B tiles/day → 2.3M tiles/sec (CDN absorbs 99%).
Road graph: global road network ≈ 200M nodes, 400M edges (~50 GB compressed).
Traffic probes: Android/iOS location at scale → billions of speed samples/hour.

High-level architecture

flowchart TB
  Client[Mobile / Web Client]
  CDN[CDN\nmap tiles]
  TileSvc[Tile Server\n256×256 PNG/vector]
  RouteAPI[Routing API]
  CH[Contraction Hierarchies\npreprocessed graph]
  Traffic[Traffic Service\nreal-time speeds]
  Graph[(Road Graph Store)]
  Probe[Probe Ingestion\nKafka]
  Search[Places Search\nElasticsearch]
  ETA[ETA ML Model]

  Client --> CDN
  Client --> RouteAPI
  CDN -->|miss| TileSvc
  RouteAPI --> CH
  RouteAPI --> Traffic
  CH --> Graph
  Traffic --> Probe
  Traffic --> ETA
  Client --> Search

Map tiles — deep dive

World divided into zoom levels 0–22. At zoom Z, earth split into 4^Z tiles. Each tile: 256×256 pixels (raster PNG) or vector (Mapbox Vector Tiles / protobuf).

Zoom	Coverage	Tile size	Cache strategy
0–5	World / continent	~5–20 KB vector	CDN cache 30+ days (rarely changes)
10–14	City / neighborhood	~20–50 KB vector	CDN cache 7 days; invalidate on map update
15–18	Street level	~50–200 KB	CDN + client disk cache
19–22	Building detail	~200 KB–1 MB	On-demand; limited pre-generation

Tile URL pattern: /tiles/{z}/{x}/{y}.pbf. CDN serves 99%+ from edge. Traffic overlay is a separate transparent tile layer refreshed every 2–5 minutes — not baked into base map.

Graph routing — deep dive

Road network = weighted directed graph: nodes = intersections, edges = road segments, weight = travel time. Naive Dijkstra on 200M nodes: O(E log V) → seconds to minutes. Unacceptable for interactive routing.

Contraction Hierarchies (CH)

Preprocess graph offline: iteratively contract least-important nodes, adding shortcut edges that preserve shortest paths. Result: hierarchy where upward search from source and downward search from target meet in middle — explored nodes drop from millions to thousands.

Preprocessing: hours on cluster for continent-sized graph; rerun weekly or on map update
Query: bidirectional search on hierarchy → <100 ms on continental graph
Memory: ~10–20× original graph size with shortcuts (acceptable for in-memory serving)
Dynamic traffic: edge weights updated from traffic service; CH structure unchanged, weights hot-reloaded

flowchart TB
  subgraph preprocess [Offline Preprocessing]
    Raw[Raw road graph\n200M nodes] --> Contract[Iterative contraction\nadd shortcuts]
    Contract --> CHGraph[CH Graph\nin memory]
  end
  subgraph query [Online Query — sub 100ms]
    Src[Source] --> Up[Upward search\nin hierarchy]
    Tgt[Target] --> Down[Downward search\nin hierarchy]
    Up --> Meet[Meet at node]
    Down --> Meet
    Meet --> Path[Unwrap shortcuts\nfull route]
  end
  CHGraph --> query

Real-time traffic — deep dive

Probe ingestion: anonymized GPS speed from Android/iOS Maps users → Kafka (billions/day)
Aggregation: Flink windows compute median speed per road segment (edge) per 2-min bucket
Weight update: travel_time = segment_length / max(median_speed, 5 km/h)
Push to routing: in-memory edge weight table refreshed every 2–5 min
ETA ML: gradient-boosted model on historical + live traffic + time-of-day + weather

⚖️ Trade-off

CH preprocessing is fast at query but slow to update graph topology (new roads). A* with landmarks (ALT) or Customizable Route Planning (CRP) trade preprocessing flexibility for query speed. Google uses CH + CRP hybrid; OSRM uses Contraction Hierarchies open-source.

Alternative algorithms (know for L6)

Algorithm	Query time	Preprocessing	Dynamic weights
Dijkstra	Seconds–minutes	None	Easy
A* + landmarks	100 ms–1 sec	Moderate	Easy
Contraction Hierarchies	<100 ms	Hours	Weights only
Customizable Route Planning	<50 ms	Moderate	Fast customization

📦 Real World

Google Maps routing research published Contraction Hierarchies (Geisberger et al., 2008). OpenStreetMap + OSRM provides open CH-based routing. Apple Maps uses similar preprocessed graph techniques. Map tiles served via Google CDN with vector tile format replacing raster at high zoom levels.

🏆 Senior Signal

"Tiles on CDN (2.3M RPS). Routing via Contraction Hierarchies — sub-100 ms on 200M node graph. Traffic weights from probe aggregation, refreshed every 2 min. Separate tile layer for traffic overlay. ETA uses ML on top of CH base time — cite 15–25% accuracy improvement over static routing."

🎯 Interview Tip

Split into two sub-problems: tile serving (CDN, easy) and routing (hard). Never say Dijkstra on full graph — immediately follow with CH or A* landmarks. L6: discuss CRP, multi-criteria routing (tolls + time), and rerouting frequency during navigation (every 30 sec or on deviation >50m).

Functional requirements

Non-functional requirements

Scale estimation

High-level architecture

ID generation strategies

301 vs 302 redirect — deep dive

MySQL + Redis read path

Analytics via Kafka

Trade-off Radar — URL Shortener

Functional requirements

Non-functional requirements & scale

Algorithm comparison

Token bucket — deep dive

Sliding window — deep dive

Distributed rate limiting

Functional requirements

Non-functional requirements

High-level architecture

In-memory hash table — deep dive

Write-Ahead Log (WAL) — deep dive

Replication & quorum

Consistent hashing — deep dive

Failure handling

Functional requirements

Scale numbers

Fan-out on write vs fan-out on read

Redis sorted sets — deep dive

Cassandra for tweet storage

The celebrity problem — deep dive

Trends — HyperLogLog

Trade-off Radar — Twitter Feed

Functional requirements

Scale numbers

Upload pipeline

Resumable upload — deep dive

Transcoding with FFmpeg — deep dive

CDN & HLS/DASH delivery

Recommendations — deep dive

Functional requirements

Scale numbers

High-level architecture

Geospatial indexing — deep dive

H3 & surge pricing — deep dive

Trade-off Radar — Uber Matching

Trip state machine

Matching algorithm

Functional requirements

Scale numbers

High-level architecture

Cassandra message storage — deep dive

Delivery receipts — deep dive

E2E encryption — Signal Protocol

Connection layer at 500M concurrent

Functional requirements

Scale numbers

High-level architecture

Map tiles — deep dive

Graph routing — deep dive

Contraction Hierarchies (CH)

Real-time traffic — deep dive

Alternative algorithms (know for L6)