The complete playbook for FAANG/MANGA system design loops: RADIO structure, ruthless time-boxing,
estimation with stated assumptions, high-level diagrams interviewers expect, deep-dive decision trees, and the
red flags that fail candidates before minute 20. Switch to Interview mode in the top bar to hide learning-only content elsewhere.
L4L5L6Principalinterview
⌕
🎯 Start Here
Toggle Interview mode before mock sessions. Practice saying assumptions aloud:
"I'll assume 50M DAU, 5 reads per user per day, peak 3× average—that's roughly 8.7K read QPS at peak."
Interviewers score your reasoning process, not diagram artistry.
The RADIO Framework
Every strong system design answer follows the same skeleton. RADIO keeps you structured under time pressure
and ensures you cover non-functional requirements, data modeling, and trade-offs—not just boxes and arrows.
Optimizing before identifying the actual bottleneck
🎯 Interview Tip
Announce your framework upfront: "I'll use RADIO—start with requirements, then architecture, data model,
APIs, and optimizations." Interviewers relax when they see structure; it signals senior communication habits.
45-Minute Time Allocation
Time-box ruthlessly. Candidates who spend 20 minutes on requirements never reach architecture.
Candidates who skip estimation guess wrong on every component count.
Frame as platform/product decision, not just technical
Build vs buy, team topology, multi-year migration
TCO, vendor risk, paved-road strategy
⚠️ Pitfall
Starting to draw before clarifying requirements is the #1 red flag. Passive silence while drawing is #2.
Think out loud—interviewers cannot score reasoning they cannot hear.
Requirements Deep Dive
Requirements are 40% of the interview score at L5+. A crisp requirements section proves you build the right system,
not just a system. Separate functional, non-functional, and explicit out-of-scope.
Functional requirements
What the system does—user-visible features and core workflows. List 3–5 MVP features; defer nice-to-haves to v2.
URL shortener: Create short URL, redirect to original, optional custom alias, analytics (clicks)
Twitter feed: Post tweet, follow users, view home timeline, like/retweet
What are the latency requirements? (p50 vs p99, which endpoints)
What consistency guarantees do we need? (strong, eventual, read-your-writes)
What is the read vs write ratio?
How long do we retain data?
Is this mobile-first, web-first, or API for third parties?
Are there compliance requirements? (GDPR, PCI, HIPAA)
Do we need multi-region from day one?
What is acceptable downtime? (availability target)
Are there geographic constraints? (data residency)
🎯 Interview Tip
Write requirements on the board before drawing. Interviewers at Google and Meta explicitly score
"problem clarification." A candidate who asks "Does the user need to see their own write immediately?"
immediately signals L5+ consistency thinking.
Every system design interview includes estimation. Off by 2× is fine; off by 100× fails.
The goal is structured thinking with stated assumptions—not precision.
The 6-step framework
Clarify assumptions: DAU, actions per user per day, read/write ratio, payload size, retention period.
Calculate QPS: (DAU × ops/user) / 100K seconds; separate read and write; apply peak multiplier (2–3×).
Calculate storage: daily writes × bytes/record × retention days; add 3× for indexes and replicas.
Calculate bandwidth: peak QPS × average request/response size; convert to Gbps.
Sanity check & connect to architecture: Compare to known systems; state which component breaks first.
Reference numbers (memorize)
1 day ≈ 100K seconds (round 86,400 for mental math)
Peak QPS ≈ 2–3× average (10× for viral events)
1 API server ≈ 1K RPS conservative (10K for static/cached)
1 SSD ≈ 100K IOPS; 1 HDD ≈ 100 IOPS
2^30 ≈ 1 GB; 2^40 ≈ 1 TB; 2^50 ≈ 1 PB
📊 Say This Aloud
"I'll assume 300M DAU, 10 timeline reads per user per day, 0.5 posts per user per day, peak 3× average.
Read QPS = 300M × 10 / 100K = 30K avg, ~90K peak. Write QPS = 300M × 0.5 / 100K = 1.5K avg, ~4.5K peak."
Worked example: URL shortener at scale
Assumptions: 500M URLs created over 5 years; 100:1 redirect:creation ratio; 500 bytes per record; peak 2× average.
STEP 1 — Assumptions
Total URLs: 500M over 5 years
New URLs/day: 500M / (5 × 365) ≈ 274K/day
Redirects: 274K × 100 ≈ 27.4M/day
Record size: 500 bytes (short_code + long_url + metadata)
STEP 2 — QPS
Write QPS = 274K / 100K ≈ 2.7 avg → 5.4 peak
Read QPS = 27.4M / 100K ≈ 274 avg → 548 peak
STEP 3 — Storage (5 years)
= 500M × 500 bytes ≈ 250 GB raw
With indexes (2×) + replicas (3×) ≈ 1.5 TB
STEP 4 — Bandwidth (peak redirects)
= 548 RPS × 1 KB response ≈ 548 KB/s ≈ 4.4 Mbps (negligible)
STEP 5 — Capacity
API servers: 548 / 1000 ≈ 1 server (reads are cacheable!)
Cache: 80/20 rule → 20% URLs = 80% traffic → cache top 100M URLs in Redis (~50 GB)
STEP 6 — Sanity check
548 peak read QPS is modest—cache-aside Redis handles this easily.
Bottleneck shifts to DB only on cache miss; 99%+ hit rate keeps DB load < 6 QPS.
🎯 Interview Tip
Step 6 is what separates L5 from L4: "548 QPS is small—I'd invest in cache hit rate, not sharding.
Sharding becomes relevant at 100K+ write QPS or TB-scale hot keys." Connect numbers to design decisions.
High-Level Architecture Template
Interviewers expect a standard diagram shape. Start with this skeleton and specialize per problem.
Label read path and write path with different colors or numbered steps.
flowchart TB
subgraph clients [Clients]
WEB[Web / Mobile]
API_CLIENT[Third-party API]
end
subgraph edge [Edge Layer]
CDN[CDN / Static Assets]
LB[Load Balancer]
RL[Rate Limiter]
end
subgraph services [Application Layer]
API[API Servers - stateless]
WORK[Async Workers]
end
subgraph cache [Cache Layer]
REDIS[(Redis / Memcached)]
end
subgraph data [Data Layer]
DB[(Primary Database)]
REPLICA[(Read Replicas)]
SEARCH[(Search Index)]
end
subgraph async [Async Layer]
QUEUE[Message Queue / Kafka]
DLQ[Dead Letter Queue]
end
subgraph obs [Observability]
LOGS[Logs / Metrics / Traces]
end
WEB --> CDN
WEB --> LB
API_CLIENT --> LB
LB --> RL --> API
API --> REDIS
API --> DB
API --> QUEUE
QUEUE --> WORK
WORK --> DB
WORK --> SEARCH
DB --> REPLICA
API --> LOGS
QUEUE -.-> DLQ
Read path walkthrough (say this while drawing)
Client request hits CDN for static assets; dynamic requests go to load balancer
Rate limiter checks token bucket per user/IP
Stateless API server checks cache (Redis) first
On cache miss, query read replica (never primary for reads at scale)
Populate cache on miss; return response with cache-control headers
Write path walkthrough
API validates request, checks idempotency key
Write to primary database synchronously
Invalidate or update cache entry (write-through or cache-aside invalidation)
Publish event to queue for async side effects (search index, notifications, analytics)
Worker processes event; failures go to DLQ after N retries
Component checklist
Component
When to include
Technology examples
CDN
Static assets, cacheable API responses, global users
CloudFront, Cloudflare, Akamai
Load balancer
Always for multi-instance deployments
ALB, NGINX, HAProxy
Rate limiter
Public APIs, abuse-prone endpoints
Redis token bucket, API gateway
Cache
Read-heavy, hot keys, session data
Redis, Memcached
Message queue
Async processing, decoupling, peak smoothing
Kafka, SQS, RabbitMQ
Search index
Full-text search, faceted filtering
Elasticsearch, OpenSearch
Blob storage
Images, videos, files
S3, GCS, MinIO
🎯 Interview Tip
Draw the read path first (interviewers usually care most about the hot path), then add write path and async
workers. Mention observability last—it signals L5+ operational maturity without eating clock time.
Deep-Dive Decision Framework
The deep-dive phase is where levels separate. Use these decision trees when the interviewer says
"let's go deeper on the database" or "how would you handle consistency?"
Database selection
Signal
Choose
Avoid
ACID transactions, complex joins, <10K TPS
PostgreSQL / MySQL
Cassandra for relational queries
High write throughput, flexible schema, partition key known
Cassandra / DynamoDB
Single PostgreSQL primary
Sub-ms reads, ephemeral data, leaderboards
Redis
PostgreSQL for hot cache data
Full-text search, faceted queries
Elasticsearch
SQL LIKE queries at scale
Graph traversals (friends-of-friends)
Neo4j / graph layer on SQL
Recursive SQL at billion-edge scale
Time-series metrics, append-only
TimescaleDB / InfluxDB
Row-oriented OLTP
Cache strategy
Pattern
When
Risk
Cache-aside
Read-heavy, cache miss acceptable
Stampede on miss; stale after write without invalidation
Write-through
Cache and DB must stay in sync
Write latency; cache memory limits
Write-behind
Write-heavy, eventual DB sync OK
Data loss if cache fails before flush
CDN edge
Static or semi-static content, global users
Invalidation propagation delay
Sharding
Shard key: Must match highest-cardinality access pattern (user_id, tenant_id, not country_code)
Strategy: Hash-based (even distribution) vs range-based (range queries, hot spots)
Resharding: Consistent hashing with virtual nodes minimizes remapping
Cross-shard queries: Scatter-gather (expensive) or denormalize to avoid them
Replication
Topology
Consistency
Use when
Single-leader
Strong on leader; eventual on replicas
Most OLTP; PostgreSQL, MySQL
Multi-leader
Conflict resolution needed
Multi-region writes; CRDTs or LWW
Leaderless (quorum)
Tunable (W+R>N)
High write availability; Cassandra, DynamoDB
API design
Concern
Decision
CRUD resources
REST with nouns, HTTP verbs, proper status codes
Real-time bidirectional
WebSocket or SSE (not polling)
Internal service-to-service
gRPC (binary, streaming, schema via Protobuf)
Pagination
Cursor-based (stable under concurrent inserts); avoid offset at scale
Versioning
URL prefix (/v1/) or Accept header; never break existing clients
Messaging
Queue (point-to-point): Task distribution, work stealing, one consumer per message
For each deep-dive choice, state: requirement → option A vs B → pick B because [metric].
"We need read-your-writes for profile edits, so reads hit primary for the session, not stale replicas."
Scaling Deep Dive
Interviewers often end with "what happens at 10× traffic?" or "what's your biggest bottleneck?"
Have a structured answer ready—identify the constraint, propose mitigation, state the trade-off.
Scaling checklist (apply in order)
Optimize before scale: Caching, DB indexes, query tuning, connection pooling
Single region → multi-AZ minimum; multi-region for DR
Monitoring & alerting (L5+ signal)
Google's four golden signals applied to your design:
Latency: p50/p99 per endpoint; alert on p99 SLO breach
Traffic: QPS, queue depth, connection count
Errors: 5xx rate, DLQ depth, failed health checks
Saturation: CPU, memory, DB connections, disk IOPS, cache memory
🎯 Interview Tip
Close scaling discussion with a prioritization: "First bottleneck at 10× is the primary DB on write path—
I'd add sharding by user_id before investing in multi-region. Multi-region is a v2 concern unless compliance requires it."
Common Follow-Up Q&A
Interviewers probe with "what if" scenarios. Strong answers acknowledge the failure, describe detection,
mitigation, and user-visible impact—without panicking or over-engineering.
Reliability & failure scenarios
Question
Strong answer skeleton
What if the cache goes down?
Cache is an optimization, not source of truth. Circuit breaker to DB; rate-limit DB queries; auto-scale read replicas; cache rebuilds on recovery. Expect 10× latency spike—degrade non-critical features.
What if the database primary fails?
Automated failover to replica (30–60s). Brief write unavailability. Queue writes in Kafka during failover if zero-loss required. Monitor replication lag to ensure promoted replica is fresh.
What if a message is processed twice?
Design for at-least-once delivery. Idempotent consumer with dedup table keyed by message_id. Natural idempotency (PUT with idempotency key) preferred over application-level dedup.
What if traffic spikes 100× (viral event)?
Pre-warmed auto-scaling; CDN absorbs read spike; queue buffers write spike; load shedding on non-critical paths; rate limit new signups if needed. Katy Perry tweet → fan-out on read for celebrity accounts.
How do you prevent duplicate payments?
Idempotency key per checkout session; DB unique constraint on (user_id, idempotency_key); return cached result on replay within 24h window.
Design choice probes
Question
Strong answer skeleton
Why not use a monolith?
At current scale ([QPS]), monolith is fine—simpler ops. I'd extract services when team boundaries or independent scaling needs emerge (Conway's law). Premature microservices add network latency and distributed tracing burden.
Why SQL over NoSQL?
Access pattern needs ACID transactions and joins ([example query]). Write QPS is [X], within single-node PostgreSQL capacity. Would revisit at [threshold] with sharding or Cassandra.
How would you migrate without downtime?
Dual-write to old and new store → backfill historical data → verify consistency → switch reads to new → stop dual-write → decommission old. Expand-contract for schema changes.
How do you handle hot keys?
Detect via metrics (single shard QPS anomaly). Mitigate: local in-process cache for hot key, read replicas dedicated to hot shard, key splitting with salting, separate celebrity tier.
How do you ensure security?
OAuth2/JWT auth, RBAC, TLS everywhere, encrypt PII at rest, rate limiting, input validation, audit log for sensitive ops, principle of least privilege for service accounts.
🎯 Interview Tip
Answer failure questions with a template: Detect → Mitigate → User impact → Long-term fix.
"We'd detect via Redis health check in 10s, mitigate by circuit-breaking to DB with rate limit,
users see 2× latency for 30s, long-term fix is Redis Cluster with automatic failover."
Red Flags & Green Flags
Interviewers maintain mental checklists. Avoid the red flags that fail candidates in the first 15 minutes.
Hit the green flags that signal senior engineering instincts.
Red flags
Behaviors that fail interviews
Drawing before clarifying — jumping to architecture without requirements
Silent drawing — not narrating thought process aloud
Buzzword bingo — "we'll use Kafka and microservices" without justification
No numbers — "low latency" and "high scale" without QPS or p99 targets
Single design — presenting one solution without mentioning alternatives considered
Ignoring failures — no mention of SPOFs, failover, or degraded mode
Over-engineering L4 problems — multi-region Kubernetes for a URL shortener at 1K QPS
Under-engineering L6 problems — single PostgreSQL for global write-heavy system at 100K TPS
No trade-offs — claiming design has no downsides
Arguing with interviewer — defensive when probed; collaboration beats combat
Infinite scope — trying to build every feature instead of scoping v1
Wrong consistency default — strong consistency everywhere without business justification
Green flags
Behaviors that pass interviews
Structured framework — announces RADIO or equivalent before starting
Clarifying questions — asks about scale, latency, consistency before designing
Stated assumptions — "I'll assume 100M DAU unless you have a different number"
Back-of-envelope math — QPS, storage, bandwidth with rounding shown
Trade-off articulation — "I chose X over Y because [requirement]; downside is [cost]"
Failure awareness — proactively identifies SPOFs and proposes mitigation
Appropriate scope — clear v1 vs v2 boundary with out-of-scope list
Access-pattern-driven design — schema and shard key from query patterns, not nouns
Sanity check — compares estimates to known real-world systems
Collaborative tone — "Does this direction make sense?" invites interviewer input
Depth on demand — goes deep when probed, stays high-level when not
⚠️ Pitfall
Saying "we need low latency" without a number is a red flag. Strong answer: "Feed load p99 < 200 ms;
post creation p99 < 500 ms." Quantified NFRs prove you've shipped production systems.
Level-Specific Expectations
The same question has different passing bars at L4 vs L6. Calibrate depth, scope, and trade-off sophistication
to your target level—over-preparing wastes time; under-preparing fails loops.
L4 · Mid
Fundamentals & happy path
Complete RADIO with interviewer providing some numbers
Correct high-level diagram with client, LB, API, DB, cache
Basic estimation (order of magnitude QPS and storage)
One deep-dive area handled competently (e.g., cache-aside)
Knows CAP at high level; picks SQL or NoSQL with simple justification
Typical problems: URL shortener, rate limiter, key-value store, parking lot
L5 · Senior
Trade-offs & failure modes
Drives requirements clarification independently; quantifies all NFRs
Estimation connected to architecture ("90K read QPS → need fan-out on write")
Multiple deep dives with explicit trade-offs (fan-out on write vs read)
Failure scenarios: cache down, DB failover, duplicate messages
L6
End every interview with a 30-second recap: "v1 is single-region with fan-out on write for users under 10K followers,
Redis cache for hot timelines, Cassandra for tweet storage sharded by tweet_id. v2 adds multi-region read replicas
and search. Biggest risk is celebrity tweet hot keys—mitigated with hybrid fan-out."
🎯 Interview Tip
Practice 3 problems at your target level and 1 problem one level above. L5 candidates should nail
Twitter and attempt Google Maps. Depth at the right altitude matters more than breadth across 20 problems.