System Design Interview Framework

The complete playbook for FAANG/MANGA system design loops: RADIO structure, ruthless time-boxing, estimation with stated assumptions, high-level diagrams interviewers expect, deep-dive decision trees, and the red flags that fail candidates before minute 20. Switch to Interview mode in the top bar to hide learning-only content elsewhere.

L4 L5 L6 Principal interview
🎯 Start Here

Toggle Interview mode before mock sessions. Practice saying assumptions aloud: "I'll assume 50M DAU, 5 reads per user per day, peak 3× average—that's roughly 8.7K read QPS at peak." Interviewers score your reasoning process, not diagram artistry.

The RADIO Framework

Every strong system design answer follows the same skeleton. RADIO keeps you structured under time pressure and ensures you cover non-functional requirements, data modeling, and trade-offs—not just boxes and arrows.

R Requirements

Functional, non-functional, explicit out-of-scope, clarifying questions

A Architecture

High-level components, data flow, major technology choices

D Data model

Schema, indexes, access patterns, partition/shard key rationale

I Interface

Key APIs, request/response shapes, pagination, error codes

O Optimizations

Bottlenecks, caching, sharding, monitoring, trade-off summary

What to deliver in each phase

Phase Deliverable Time budget Common mistake
R — Requirements Written list: functional features, NFRs with numbers, out-of-scope items 5 min Skipping NFRs; vague "low latency" without p99 target
A — Architecture Diagram: client → edge → services → data stores → async workers 10 min Drawing microservices for a problem that needs a monolith
D — Data model Tables/collections, primary keys, indexes, read vs write paths 5 min (often in deep dive) Schema before knowing access patterns
I — Interface 3–5 critical API endpoints with HTTP method, path, key fields 5 min REST for everything when WebSocket or gRPC fits better
O — Optimizations Cache layer, CDN, sharding plan, SPOF elimination, cost note 10–15 min Optimizing before identifying the actual bottleneck
🎯 Interview Tip

Announce your framework upfront: "I'll use RADIO—start with requirements, then architecture, data model, APIs, and optimizations." Interviewers relax when they see structure; it signals senior communication habits.

45-Minute Time Allocation

Time-box ruthlessly. Candidates who spend 20 minutes on requirements never reach architecture. Candidates who skip estimation guess wrong on every component count.

0–5 min Requirements

Clarify functional scope, DAU/QPS, latency p99, consistency, availability, out-of-scope. Ask 3–5 questions.

5–10 min Estimation

Back-of-envelope: QPS, storage, bandwidth, server count. State every assumption aloud.

10–20 min High-level design

Draw boxes: client, CDN, LB, API, cache, DB, queue, workers. Walk through read and write paths.

20–35 min Deep dive

Interviewer picks: DB schema, cache strategy, fan-out, consistency, API design, failure modes.

35–45 min Scale & wrap

10× traffic, single points of failure, monitoring, trade-offs recap, what you'd build in v1 vs v2.

Adaptive timing by level

Level Requirements Deep dive depth Wrap-up focus
L4 Interviewer may provide numbers; 3 min sufficient One component (e.g., cache or DB choice) Happy path + one failure mode
L5 You drive clarification; 5 min expected Two components + trade-off reasoning Scale 10×, monitoring, cost awareness
L6 Proactively scope v1/v2; edge cases in requirements Cross-cutting: consistency + ops + org implications Multi-region, incident response, evolutionary path
Principal Frame as platform/product decision, not just technical Build vs buy, team topology, multi-year migration TCO, vendor risk, paved-road strategy
⚠️ Pitfall

Starting to draw before clarifying requirements is the #1 red flag. Passive silence while drawing is #2. Think out loud—interviewers cannot score reasoning they cannot hear.

Requirements Deep Dive

Requirements are 40% of the interview score at L5+. A crisp requirements section proves you build the right system, not just a system. Separate functional, non-functional, and explicit out-of-scope.

Functional requirements

What the system does—user-visible features and core workflows. List 3–5 MVP features; defer nice-to-haves to v2.

  • URL shortener: Create short URL, redirect to original, optional custom alias, analytics (clicks)
  • Twitter feed: Post tweet, follow users, view home timeline, like/retweet
  • Uber: Request ride, match driver, track location, complete payment, rate trip
  • Chat: Send/receive messages, delivery receipts, group chat, online presence

Non-functional requirements (always quantify)

NFR category Questions to ask Example strong answer
Scale DAU? MAU? Read/write ratio? Data retention? "100M DAU, 10:1 read:write, 5-year retention"
Latency p50 vs p99? Which endpoints? Mobile vs web? "Feed load p99 < 200 ms; post tweet p99 < 500 ms"
Availability How many nines? Acceptable downtime window? "99.9% (8.7 hr/year); chat needs 99.99%, analytics 99.5%"
Consistency Strong everywhere or per-operation? Read-your-writes? "Read-your-writes for user's own posts; eventual for global feed"
Durability Can we lose data? Backup RPO/RTO? "Zero message loss; RPO < 1 min, RTO < 15 min"
Security Auth model? PII? Compliance (GDPR, HIPAA, PCI)? "OAuth2, encrypted at rest, GDPR right-to-delete"

Explicit out-of-scope (critical at L5+)

Naming what you are not building saves time and shows product sense:

  • "v1 excludes: search, recommendations, multi-region, admin dashboard"
  • "v1 excludes: video upload—text and images only"
  • "v1 excludes: real-time collaborative editing—single-writer model"
  • "v1 excludes: ML ranking—chronological feed only"

Questions to ask the interviewer (pick 5–7)

  1. What is the expected scale? (DAU, QPS, data size)
  2. What are the latency requirements? (p50 vs p99, which endpoints)
  3. What consistency guarantees do we need? (strong, eventual, read-your-writes)
  4. What is the read vs write ratio?
  5. How long do we retain data?
  6. Is this mobile-first, web-first, or API for third parties?
  7. Are there compliance requirements? (GDPR, PCI, HIPAA)
  8. Do we need multi-region from day one?
  9. What is acceptable downtime? (availability target)
  10. Are there geographic constraints? (data residency)
🎯 Interview Tip

Write requirements on the board before drawing. Interviewers at Google and Meta explicitly score "problem clarification." A candidate who asks "Does the user need to see their own write immediately?" immediately signals L5+ consistency thinking.

📋 Requirements Template

Functional: [feature 1], [feature 2], [feature 3]
Non-functional: [DAU], [p99 latency], [availability], [consistency model]
Out of scope: [v2 item 1], [v2 item 2]

Estimation Template — 6 Steps

Every system design interview includes estimation. Off by 2× is fine; off by 100× fails. The goal is structured thinking with stated assumptions—not precision.

The 6-step framework

  1. Clarify assumptions: DAU, actions per user per day, read/write ratio, payload size, retention period.
  2. Calculate QPS: (DAU × ops/user) / 100K seconds; separate read and write; apply peak multiplier (2–3×).
  3. Calculate storage: daily writes × bytes/record × retention days; add 3× for indexes and replicas.
  4. Calculate bandwidth: peak QPS × average request/response size; convert to Gbps.
  5. Calculate capacity: peak QPS / per-server throughput; add 30% headroom; size DB IOPS separately.
  6. Sanity check & connect to architecture: Compare to known systems; state which component breaks first.

Reference numbers (memorize)

  • 1 day ≈ 100K seconds (round 86,400 for mental math)
  • Peak QPS ≈ 2–3× average (10× for viral events)
  • 1 API server ≈ 1K RPS conservative (10K for static/cached)
  • 1 SSD ≈ 100K IOPS; 1 HDD ≈ 100 IOPS
  • 2^30 ≈ 1 GB; 2^40 ≈ 1 TB; 2^50 ≈ 1 PB
📊 Say This Aloud

"I'll assume 300M DAU, 10 timeline reads per user per day, 0.5 posts per user per day, peak 3× average. Read QPS = 300M × 10 / 100K = 30K avg, ~90K peak. Write QPS = 300M × 0.5 / 100K = 1.5K avg, ~4.5K peak."

Worked example: URL shortener at scale

Assumptions: 500M URLs created over 5 years; 100:1 redirect:creation ratio; 500 bytes per record; peak 2× average.

STEP 1 — Assumptions
  Total URLs: 500M over 5 years
  New URLs/day: 500M / (5 × 365) ≈ 274K/day
  Redirects: 274K × 100 ≈ 27.4M/day
  Record size: 500 bytes (short_code + long_url + metadata)

STEP 2 — QPS
  Write QPS = 274K / 100K ≈ 2.7 avg → 5.4 peak
  Read QPS  = 27.4M / 100K ≈ 274 avg → 548 peak

STEP 3 — Storage (5 years)
  = 500M × 500 bytes ≈ 250 GB raw
  With indexes (2×) + replicas (3×) ≈ 1.5 TB

STEP 4 — Bandwidth (peak redirects)
  = 548 RPS × 1 KB response ≈ 548 KB/s ≈ 4.4 Mbps (negligible)

STEP 5 — Capacity
  API servers: 548 / 1000 ≈ 1 server (reads are cacheable!)
  Cache: 80/20 rule → 20% URLs = 80% traffic → cache top 100M URLs in Redis (~50 GB)

STEP 6 — Sanity check
  548 peak read QPS is modest—cache-aside Redis handles this easily.
  Bottleneck shifts to DB only on cache miss; 99%+ hit rate keeps DB load < 6 QPS.
🎯 Interview Tip

Step 6 is what separates L5 from L4: "548 QPS is small—I'd invest in cache hit rate, not sharding. Sharding becomes relevant at 100K+ write QPS or TB-scale hot keys." Connect numbers to design decisions.

High-Level Architecture Template

Interviewers expect a standard diagram shape. Start with this skeleton and specialize per problem. Label read path and write path with different colors or numbered steps.

flowchart TB
  subgraph clients [Clients]
    WEB[Web / Mobile]
    API_CLIENT[Third-party API]
  end

  subgraph edge [Edge Layer]
    CDN[CDN / Static Assets]
    LB[Load Balancer]
    RL[Rate Limiter]
  end

  subgraph services [Application Layer]
    API[API Servers - stateless]
    WORK[Async Workers]
  end

  subgraph cache [Cache Layer]
    REDIS[(Redis / Memcached)]
  end

  subgraph data [Data Layer]
    DB[(Primary Database)]
    REPLICA[(Read Replicas)]
    SEARCH[(Search Index)]
  end

  subgraph async [Async Layer]
    QUEUE[Message Queue / Kafka]
    DLQ[Dead Letter Queue]
  end

  subgraph obs [Observability]
    LOGS[Logs / Metrics / Traces]
  end

  WEB --> CDN
  WEB --> LB
  API_CLIENT --> LB
  LB --> RL --> API
  API --> REDIS
  API --> DB
  API --> QUEUE
  QUEUE --> WORK
  WORK --> DB
  WORK --> SEARCH
  DB --> REPLICA
  API --> LOGS
  QUEUE -.-> DLQ

Read path walkthrough (say this while drawing)

  1. Client request hits CDN for static assets; dynamic requests go to load balancer
  2. Rate limiter checks token bucket per user/IP
  3. Stateless API server checks cache (Redis) first
  4. On cache miss, query read replica (never primary for reads at scale)
  5. Populate cache on miss; return response with cache-control headers

Write path walkthrough

  1. API validates request, checks idempotency key
  2. Write to primary database synchronously
  3. Invalidate or update cache entry (write-through or cache-aside invalidation)
  4. Publish event to queue for async side effects (search index, notifications, analytics)
  5. Worker processes event; failures go to DLQ after N retries

Component checklist

Component When to include Technology examples
CDNStatic assets, cacheable API responses, global usersCloudFront, Cloudflare, Akamai
Load balancerAlways for multi-instance deploymentsALB, NGINX, HAProxy
Rate limiterPublic APIs, abuse-prone endpointsRedis token bucket, API gateway
CacheRead-heavy, hot keys, session dataRedis, Memcached
Message queueAsync processing, decoupling, peak smoothingKafka, SQS, RabbitMQ
Search indexFull-text search, faceted filteringElasticsearch, OpenSearch
Blob storageImages, videos, filesS3, GCS, MinIO
🎯 Interview Tip

Draw the read path first (interviewers usually care most about the hot path), then add write path and async workers. Mention observability last—it signals L5+ operational maturity without eating clock time.

Deep-Dive Decision Framework

The deep-dive phase is where levels separate. Use these decision trees when the interviewer says "let's go deeper on the database" or "how would you handle consistency?"

Database selection

SignalChooseAvoid
ACID transactions, complex joins, <10K TPSPostgreSQL / MySQLCassandra for relational queries
High write throughput, flexible schema, partition key knownCassandra / DynamoDBSingle PostgreSQL primary
Sub-ms reads, ephemeral data, leaderboardsRedisPostgreSQL for hot cache data
Full-text search, faceted queriesElasticsearchSQL LIKE queries at scale
Graph traversals (friends-of-friends)Neo4j / graph layer on SQLRecursive SQL at billion-edge scale
Time-series metrics, append-onlyTimescaleDB / InfluxDBRow-oriented OLTP

Cache strategy

PatternWhenRisk
Cache-asideRead-heavy, cache miss acceptableStampede on miss; stale after write without invalidation
Write-throughCache and DB must stay in syncWrite latency; cache memory limits
Write-behindWrite-heavy, eventual DB sync OKData loss if cache fails before flush
CDN edgeStatic or semi-static content, global usersInvalidation propagation delay

Sharding

  • Shard key: Must match highest-cardinality access pattern (user_id, tenant_id, not country_code)
  • Strategy: Hash-based (even distribution) vs range-based (range queries, hot spots)
  • Resharding: Consistent hashing with virtual nodes minimizes remapping
  • Cross-shard queries: Scatter-gather (expensive) or denormalize to avoid them

Replication

TopologyConsistencyUse when
Single-leaderStrong on leader; eventual on replicasMost OLTP; PostgreSQL, MySQL
Multi-leaderConflict resolution neededMulti-region writes; CRDTs or LWW
Leaderless (quorum)Tunable (W+R>N)High write availability; Cassandra, DynamoDB

API design

ConcernDecision
CRUD resourcesREST with nouns, HTTP verbs, proper status codes
Real-time bidirectionalWebSocket or SSE (not polling)
Internal service-to-servicegRPC (binary, streaming, schema via Protobuf)
PaginationCursor-based (stable under concurrent inserts); avoid offset at scale
VersioningURL prefix (/v1/) or Accept header; never break existing clients

Messaging

  • Queue (point-to-point): Task distribution, work stealing, one consumer per message
  • Pub/sub: Event broadcast, multiple subscribers, fan-out decoupling
  • Event log (Kafka): Replay, audit trail, stream processing, retention
  • Delivery: At-least-once + idempotent consumer for most cases; exactly-once when money moves

Consistency

RequirementModelImplementation
Bank balance, inventoryStrong / linearizableSingle leader, consensus, or conditional writes
User sees own edit immediatelyRead-your-writesSticky routing, session token, write-through cache
Feed/timeline orderingMonotonic reads + eventualPer-user cursor, versioned cache entries
View counts, analyticsEventualAsync aggregation, approximate counters
🎯 Interview Tip

For each deep-dive choice, state: requirement → option A vs B → pick B because [metric]. "We need read-your-writes for profile edits, so reads hit primary for the session, not stale replicas."

Scaling Deep Dive

Interviewers often end with "what happens at 10× traffic?" or "what's your biggest bottleneck?" Have a structured answer ready—identify the constraint, propose mitigation, state the trade-off.

Scaling checklist (apply in order)

  1. Optimize before scale: Caching, DB indexes, query tuning, connection pooling
  2. Scale stateless tiers: Horizontal pod/instance scaling behind LB
  3. Scale reads: Read replicas, CDN, materialized views
  4. Scale writes: Sharding, partitioning, async write-behind
  5. Scale data: Archival, tiered storage, compression
  6. Scale geography: Multi-region cells, geo-DNS routing

Identify the bottleneck

SymptomLikely bottleneckMitigation
High p99 on readsDB or cache miss stormCache warming, read replicas, denormalize
Write latency spikesPrimary DB saturationShard writes, async queue, batch writes
Queue lag growingConsumer throughputScale consumers (≤ partition count), optimize handler
Single shard hotBad shard key (celebrity user)Key splitting, separate hot-key cache tier
Cross-region latencyPhysics (50ms RTT)Regional cells, async replication, CRDTs

Single points of failure (always address)

  • Load balancer → DNS failover or multi-LB with anycast
  • Primary database → automated failover to replica (RDS Multi-AZ, Patroni)
  • Redis single node → Redis Cluster or Sentinel
  • Single Kafka broker → replication factor ≥ 3, min.in.sync.replicas = 2
  • Single region → multi-AZ minimum; multi-region for DR

Monitoring & alerting (L5+ signal)

Google's four golden signals applied to your design:

  • Latency: p50/p99 per endpoint; alert on p99 SLO breach
  • Traffic: QPS, queue depth, connection count
  • Errors: 5xx rate, DLQ depth, failed health checks
  • Saturation: CPU, memory, DB connections, disk IOPS, cache memory
🎯 Interview Tip

Close scaling discussion with a prioritization: "First bottleneck at 10× is the primary DB on write path— I'd add sharding by user_id before investing in multi-region. Multi-region is a v2 concern unless compliance requires it."

Common Follow-Up Q&A

Interviewers probe with "what if" scenarios. Strong answers acknowledge the failure, describe detection, mitigation, and user-visible impact—without panicking or over-engineering.

Reliability & failure scenarios

QuestionStrong answer skeleton
What if the cache goes down? Cache is an optimization, not source of truth. Circuit breaker to DB; rate-limit DB queries; auto-scale read replicas; cache rebuilds on recovery. Expect 10× latency spike—degrade non-critical features.
What if the database primary fails? Automated failover to replica (30–60s). Brief write unavailability. Queue writes in Kafka during failover if zero-loss required. Monitor replication lag to ensure promoted replica is fresh.
What if a message is processed twice? Design for at-least-once delivery. Idempotent consumer with dedup table keyed by message_id. Natural idempotency (PUT with idempotency key) preferred over application-level dedup.
What if traffic spikes 100× (viral event)? Pre-warmed auto-scaling; CDN absorbs read spike; queue buffers write spike; load shedding on non-critical paths; rate limit new signups if needed. Katy Perry tweet → fan-out on read for celebrity accounts.
How do you prevent duplicate payments? Idempotency key per checkout session; DB unique constraint on (user_id, idempotency_key); return cached result on replay within 24h window.

Design choice probes

QuestionStrong answer skeleton
Why not use a monolith? At current scale ([QPS]), monolith is fine—simpler ops. I'd extract services when team boundaries or independent scaling needs emerge (Conway's law). Premature microservices add network latency and distributed tracing burden.
Why SQL over NoSQL? Access pattern needs ACID transactions and joins ([example query]). Write QPS is [X], within single-node PostgreSQL capacity. Would revisit at [threshold] with sharding or Cassandra.
How would you migrate without downtime? Dual-write to old and new store → backfill historical data → verify consistency → switch reads to new → stop dual-write → decommission old. Expand-contract for schema changes.
How do you handle hot keys? Detect via metrics (single shard QPS anomaly). Mitigate: local in-process cache for hot key, read replicas dedicated to hot shard, key splitting with salting, separate celebrity tier.
How do you ensure security? OAuth2/JWT auth, RBAC, TLS everywhere, encrypt PII at rest, rate limiting, input validation, audit log for sensitive ops, principle of least privilege for service accounts.
🎯 Interview Tip

Answer failure questions with a template: Detect → Mitigate → User impact → Long-term fix. "We'd detect via Redis health check in 10s, mitigate by circuit-breaking to DB with rate limit, users see 2× latency for 30s, long-term fix is Redis Cluster with automatic failover."

Red Flags & Green Flags

Interviewers maintain mental checklists. Avoid the red flags that fail candidates in the first 15 minutes. Hit the green flags that signal senior engineering instincts.

Red flags

Behaviors that fail interviews

  • Drawing before clarifying — jumping to architecture without requirements
  • Silent drawing — not narrating thought process aloud
  • Buzzword bingo — "we'll use Kafka and microservices" without justification
  • No numbers — "low latency" and "high scale" without QPS or p99 targets
  • Single design — presenting one solution without mentioning alternatives considered
  • Ignoring failures — no mention of SPOFs, failover, or degraded mode
  • Over-engineering L4 problems — multi-region Kubernetes for a URL shortener at 1K QPS
  • Under-engineering L6 problems — single PostgreSQL for global write-heavy system at 100K TPS
  • No trade-offs — claiming design has no downsides
  • Arguing with interviewer — defensive when probed; collaboration beats combat
  • Infinite scope — trying to build every feature instead of scoping v1
  • Wrong consistency default — strong consistency everywhere without business justification
Green flags

Behaviors that pass interviews

  • Structured framework — announces RADIO or equivalent before starting
  • Clarifying questions — asks about scale, latency, consistency before designing
  • Stated assumptions — "I'll assume 100M DAU unless you have a different number"
  • Back-of-envelope math — QPS, storage, bandwidth with rounding shown
  • Trade-off articulation — "I chose X over Y because [requirement]; downside is [cost]"
  • Failure awareness — proactively identifies SPOFs and proposes mitigation
  • Appropriate scope — clear v1 vs v2 boundary with out-of-scope list
  • Operational maturity — mentions monitoring, alerting, runbooks, SLOs
  • Access-pattern-driven design — schema and shard key from query patterns, not nouns
  • Sanity check — compares estimates to known real-world systems
  • Collaborative tone — "Does this direction make sense?" invites interviewer input
  • Depth on demand — goes deep when probed, stays high-level when not
⚠️ Pitfall

Saying "we need low latency" without a number is a red flag. Strong answer: "Feed load p99 < 200 ms; post creation p99 < 500 ms." Quantified NFRs prove you've shipped production systems.

Level-Specific Expectations

The same question has different passing bars at L4 vs L6. Calibrate depth, scope, and trade-off sophistication to your target level—over-preparing wastes time; under-preparing fails loops.

L4 · Mid

Fundamentals & happy path

  • Complete RADIO with interviewer providing some numbers
  • Correct high-level diagram with client, LB, API, DB, cache
  • Basic estimation (order of magnitude QPS and storage)
  • One deep-dive area handled competently (e.g., cache-aside)
  • Knows CAP at high level; picks SQL or NoSQL with simple justification
  • Typical problems: URL shortener, rate limiter, key-value store, parking lot
L5 · Senior

Trade-offs & failure modes

  • Drives requirements clarification independently; quantifies all NFRs
  • Estimation connected to architecture ("90K read QPS → need fan-out on write")
  • Multiple deep dives with explicit trade-offs (fan-out on write vs read)
  • Failure scenarios: cache down, DB failover, duplicate messages
  • Operational awareness: monitoring, DLQ, idempotency, graceful degradation
  • Typical problems: Twitter feed, YouTube, Uber, WhatsApp, Dropbox, notification system
L6 · Staff

Cross-cutting & scale edge cases

  • Scopes v1/v2 with evolutionary architecture path (strangler fig, dual-write migration)
  • Consistency model chosen per operation with PACELC reasoning
  • Multi-region, hot-key, and celebrity-user edge cases proactively addressed
  • Cross-team implications: data contracts, schema evolution, on-call burden
  • Cost analysis: "$X/month at this scale; tiered storage saves Y%"
  • Typical problems: Google Maps, Facebook search, distributed DB, ad aggregation, stock exchange
Principal

Platform & org-level strategy

  • Frames problem as product/platform decision, not purely technical
  • Build vs buy with TCO, vendor lock-in, and team capability assessment
  • Conway's law and team topologies inform architecture boundaries
  • Multi-year migration strategy with risk milestones and rollback plans
  • Paved roads and golden paths for organizational scale
  • Typical topics: platform design, multi-year re-architecture, org design implications
🏆 Senior Signal

L6 End every interview with a 30-second recap: "v1 is single-region with fan-out on write for users under 10K followers, Redis cache for hot timelines, Cassandra for tweet storage sharded by tweet_id. v2 adds multi-region read replicas and search. Biggest risk is celebrity tweet hot keys—mitigated with hybrid fan-out."

🎯 Interview Tip

Practice 3 problems at your target level and 1 problem one level above. L5 candidates should nail Twitter and attempt Google Maps. Depth at the right altitude matters more than breadth across 20 problems.