Three dense quick references for interview day and daily architecture work. Each block is copy-to-clipboard—paste into
your notes app, flash cards, or whiteboard prep doc. Toggle Interview mode to surface interview-specific callouts.
Expand or collapse each panel; every code block has its own Copy button.
L4L5L6interviewQuick reference
⌕
💡 Pro Tip
Review these sheets the morning of your interview. Copy the Interview Cheat Sheet block into a notes doc
and rehearse saying assumptions aloud: "300M DAU, 10 reads/day, peak 3×—that's ~35K read QPS."
Numbers beat adjectives every time.
Fundamentals Cheat Sheet
Jeff Dean's latency hierarchy, availability math, CAP/PACELC trade-offs, and back-of-envelope formulas.
Memorize the ratios (L1→RAM ~100×, RAM→SSD ~1000×) rather than exact nanoseconds.
Fundamentals — copy blocks
Latency numbers every engineer must know
LATENCY NUMBERS (modern hardware, order-of-magnitude)
────────────────────────────────────────────────────
Operation Latency Ratio vs L1
────────────────────────────────────────────────────
L1 cache reference ~1 ns 1×
L2 cache reference ~4 ns 4×
L3 cache reference ~40 ns 40×
Main memory (RAM) ~100 ns 100×
SSD random read ~100 μs 100,000×
HDD seek + read ~5 ms 5,000,000×
Same-datacenter RTT ~0.5 ms —
Cross-region RTT (US↔EU) ~50 ms —
Read 1 MB from RAM ~250 μs
Read 1 MB from SSD ~1 ms
Read 1 MB from HDD ~20 ms
Send 1 MB over 1 Gbps net ~10 ms
────────────────────────────────────────────────────
KEY RATIOS TO MEMORIZE
L1 → RAM ≈ 100×
RAM → SSD ≈ 1,000×
SSD → cross-region RTT ≈ 500×
DESIGN RULES
• Hot path touching disk/network more than once → latency problem
• Fix: move data closer (cache), touch fewer things (batch/denorm), do less (async/precompute)
• Each same-DC RPC adds ~0.5 ms; 10 serial calls = 5 ms minimum
• Cross-region sync replication → 50+ ms writes; plan for async or regional partitions
Availability nines
AVAILABILITY NINES — DOWNTIME BUDGET
────────────────────────────────────────────────────────────────
Availability Downtime/year Downtime/month Typical tier
────────────────────────────────────────────────────────────────
99% 3.65 days 7.2 hours Dev / internal tools
99.9% 8.76 hours 43.8 minutes B2B SaaS, non-critical APIs
99.99% 52.6 minutes 4.38 minutes Payments, core product APIs
99.999% 5.26 minutes 26.3 seconds Telco, hospital, trading
99.9999% 31.5 seconds 2.63 seconds Active-active multi-region
────────────────────────────────────────────────────────────────
FORMULAS
Availability = MTBF / (MTBF + MTTR)
Error budget (30 days) = (1 − SLO) × 43,200 minutes
99.9% SLO → 43.8 min/month allowed downtime
RULE OF THUMB
Each additional nine costs ~10× in engineering + infrastructure.
Most consumer products target 99.9%–99.99%.
REDUNDANCY PATTERNS (quick)
Active-passive → standby idle; failover seconds–minutes; lower cost
Active-active → all nodes serve; instant reroute; conflict resolution needed
N+1 → one failure absorbed; ~33% overhead at N=3
Geo-redundant → survives region loss; 2–3× infra + replication lag
CAP & PACELC summary
CAP THEOREM (during network partition — P is not optional)
────────────────────────────────────────────────────────
C = Consistency — every read sees latest write or error
A = Availability — every request gets non-error response (may be stale)
P = Partition tolerance — system continues despite network splits
Real choice: CP vs AP during partition (not at design time forever)
CP (sacrifice A) AP (sacrifice C)
───────────────── ─────────────────
etcd, ZooKeeper Cassandra, DynamoDB
HBase, sync RDBMS CouchDB, DNS, Riak
Reject minority Both sides serve; may diverge
Financial ledger Shopping cart, likes, analytics
PACELC (Abadi 2012 — the normal-case trade-off)
────────────────────────────────────────────────────────
If Partition → choose A or C
Else (no partition) → choose Latency (L) or Consistency (C)
System Partition Else Class
────────────────────────────────────────────────────────
Cassandra PA EL PA/EL (default ONE)
DynamoDB PA EL eventual default; strong = 2× cost
MongoDB PC EC w:1 fast; w:majority consistent
MySQL primary PC EC/EL sync rep = EC; read replicas = EL
Google Spanner PC EC pays latency for global consistency
INTERVIEW ONE-LINER
"CAP is per-operation during partition. Cart is AP; payment is CP.
PACELC: even without partition we trade latency vs consistency on every read."
Estimation formulas
BACK-OF-ENVELOPE — 5-STEP FRAMEWORK
────────────────────────────────────────────────────────
1. Clarify assumptions: DAU, ops/user/day, read:write ratio, payload size, retention
2. QPS: (DAU × ops/user) / 86,400 → round 86,400 to 100K for mental math
3. Peak: avg QPS × peak factor (2–3× typical; 10× for viral)
4. Storage: DAU × ops × bytes/record × retention days (add 3× for indexes)
5. Bandwidth: peak QPS × avg request/response size
6. Servers: peak QPS / per-server capacity + 30% headroom
USEFUL ROUNDING CONSTANTS
1 day ≈ 100K seconds
1 month ≈ 2.5M seconds
1 year ≈ 30M seconds
1 server (API) ≈ 1K–10K RPS (use 1K conservative)
1 SSD ≈ 100K IOPS; 1 HDD ≈ 100 IOPS
1 Gbps = 125 MB/s; 10 Gbps = 1.25 GB/s
LITTLE'S LAW
L = λ × W
concurrent requests = arrival rate × avg time in system
Example: 2,000 RPS × 50 ms = 100 in-flight requests
PERCENTILE SLOs (not averages)
p50 — median; 20–50 ms for read APIs
p99 — 99% faster; 100–300 ms product-dependent
p999 — tail; cache miss, GC, cross-region failover
WORKED EXAMPLE (Twitter-scale reads)
300M DAU × 10 reads/day / 100K = 30K avg → 90K peak (3×)
300M × 0.5 tweets/day / 100K = 1.5K write avg → 4.5K peak
Powers of 2
POWERS OF TWO — MEMORIZE FOR STORAGE MATH
────────────────────────────────────────────────────────
Power Exact value Approx Unit
────────────────────────────────────────────────────────
2^10 1,024 ~1 Thousand 1 KB
2^20 1,048,576 ~1 Million 1 MB
2^30 1,073,741,824 ~1 Billion 1 GB
2^40 1,099,511,627,776 ~1 Trillion 1 TB
2^50 1,125,899,906,842,624 ~1 Quadrillion 1 PB
────────────────────────────────────────────────────────
QUICK CONVERSIONS
1 KB = 10^3 bytes (decimal) vs 2^10 = 1,024 (binary — use 1,000 in interviews)
1 million users × 1 KB/day = 1 GB/day
1 billion requests/day ÷ 100K sec ≈ 10,000 RPS average
SANITY CHECKS
100M users × 10 ops/day = 1B ops/day ≈ 10K RPS avg
10K RPS × 1 KB response = 10 MB/s ≈ 80 Mbps peak bandwidth
1 PB / 5 years tweets → need sharding + cold storage tiering
🔬 Under the Hood
L5
Latency numbers explain why Redis exists: a 1 ms cache hit beats a 5–10 ms PostgreSQL round-trip.
Availability math explains why 99.99% needs automated failover—43 minutes/year at 99.9% burns fast during deploys.
🎯 Interview Tip
L4
When asked "why cache?", cite: "RAM ~100 ns, SSD ~100 μs—that's 1000×. Redis at 1 ms still beats DB at 5–10 ms."
Quantitative reasoning beats "caching makes things faster."
Database Selection Cheat Sheet
Start with access patterns, not brand names. Five questions: read/write ratio, query shape, consistency needs,
scale trajectory, operational maturity. Then map to store family and replication/sharding strategy.
Database selection — copy blocks
Decision matrix by store family
DATABASE SELECTION — 5 QUESTIONS FIRST
────────────────────────────────────────────────────────
1. Read vs write ratio? 90% reads → replicas + cache
2. Query shape? point / range / join / geo / full-text / traversal
3. Consistency needs? ledger (strong) vs likes (eventual)
4. Scale trajectory? single node 2 years vs 100M rows day one
5. Ops maturity? managed vs self-hosted Cassandra cluster
DECISION MATRIX
────────────────────────────────────────────────────────
Family Best for Weak at
────────────────────────────────────────────────────────
Relational ACID, joins, ad-hoc SQL Horizontal writes w/o sharding
Document Nested JSON, flexible schema Multi-doc ACID, huge documents
Wide-column High writes, geo-distribution Ad-hoc joins, bad partition keys
Key-value Cache, sessions, rate limits Complex queries, durability
Time-series Metrics, IoT, downsampling General OLTP, mutable rows
Search Full-text, facets, ranking Source of truth, strong consistency
Graph Path queries, fraud rings Bulk analytics at extreme scale
────────────────────────────────────────────────────────
QUICK PICK FLOW
Structured + joins? → PostgreSQL / MySQL
Document JSON? → MongoDB / DynamoDB
Massive write throughput? → Cassandra / ScyllaDB
Sub-ms cache/session? → Redis / Memcached
Time-ordered metrics? → TimescaleDB / ClickHouse / InfluxDB
Full-text search? → Elasticsearch / OpenSearch
Relationship traversals? → Neo4j / Neptune
POLYGLOT PATTERN (L6 signal)
PostgreSQL = system of record
Redis = cache-aside hot reads
Elasticsearch = search index via Kafka CDC
Name primary store + justify satellites per access pattern
Consistency models quick reference
CONSISTENCY SPECTRUM (weakest → strongest)
────────────────────────────────────────────────────────
Model Guarantee Latency Use case
────────────────────────────────────────────────────────
Eventual Replicas converge over time Lowest DNS, CDN, view counts
Monotonic reads Never go backward in time Low Feed pagination
Read-your-writes User sees own writes Low-med Profile edits, settings
Causal Cause precedes effect Medium Comments, messaging
Sequential Global order, reads may lag Med-high Collaborative editing
Strong/Linearizable Latest write globally visible High Bank balance, inventory
────────────────────────────────────────────────────────
SCENARIO → MODEL
Payment / transfer → Strong (fail closed on uncertainty)
User profile edit → Read-your-writes (sticky session or primary reads)
Twitter timeline → Eventual + monotonic reads
Flash sale inventory → Strong or optimistic locking
Page view counter → Eventual (approximate OK)
Collaborative doc → Causal or CRDT
READ-YOUR-WRITES IMPLEMENTATIONS
• Sticky sessions to replica that handled write
• Session token with last-write timestamp
• Write-through cache updated synchronously
• User writes + reads both hit primary; others hit replicas
INTERVIEW QUESTION TO ASK
"Does the user need their own write immediately, or do ALL users
need the latest data globally?" → narrows model in 10 seconds
Replication topologies
REPLICATION — THREE TOPOLOGIES
────────────────────────────────────────────────────────
1. SINGLE-LEADER (most RDBMS, MongoDB replica set)
All writes → leader; replicas apply WAL/binlog/oplog
Sync replication: wait for replica ACK → stronger, higher latency
Async replication: leader ACKs immediately → faster, replication lag
Failover: promote replica; risk split-brain without quorum
Read scaling: route reads to replicas (eventual consistency)
Lag typical: 10 ms – seconds under load
2. MULTI-LEADER (active-active across regions)
Writes accepted at multiple leaders; async sync between them
Pros: low write latency per region; high availability
Cons: write conflicts need resolution (LWW, CRDT, app merge)
Use: collaborative editing, calendars; NOT financial ledger
3. LEADERLESS / QUORUM (Dynamo-style: Cassandra, DynamoDB, Riak)
N = replication factor; W = write quorum; R = read quorum
Quorum: R + W > N → overlap guarantees fresh read
Examples:
N=3, W=2, R=2 → tolerates 1 node down, consistent quorum
N=3, W=1, R=1 → fast, eventual (hinted handoff + read repair)
DynamoDB: eventual default; strong read = leader replica only
FAILOVER CHECKLIST
□ Health checks on replication lag (not just TCP alive)
□ Automated promotion with fencing token / STONITH
□ Test failover quarterly—broken automation surfaces in real incidents
□ Clients retry with idempotency keys after failover
Sharding quick reference
SHARDING — WHEN & HOW
────────────────────────────────────────────────────────
BEFORE SHARDING (each buys ~10× cheaper than sharding)
1. Archive cold data
2. Denormalize hot queries
3. Connection pooling (PgBouncer)
4. Read replicas
5. Cache-aside (Redis)
WHEN TO SHARD
Single-node write QPS exhausted (~10K–50K writes/sec depending on row size)
Data size exceeds single-node storage with acceptable query latency
Hot row / hot partition cannot be isolated otherwise
SHARD KEY SELECTION (most important decision)
✓ High cardinality (user_id, tenant_id)
✓ Even distribution (avoid monotonic timestamps alone)
✓ Query locality (co-locate data accessed together)
✗ Low cardinality (country_code alone)
✗ Hot keys (celebrity user → one shard overload)
SHARDING STRATEGIES
────────────────────────────────────────────────────────
Strategy Routing Pros Cons
────────────────────────────────────────────────────────
Hash hash(key) % N Even spread Resharding remaps most keys
Range key ranges on shards Range queries OK Hotspots on latest range
Directory lookup table Flexible Lookup SPOF; manual ops
Geographic region = shard Data residency Cross-region queries hard
Consistent hash ring + virtual nodes Minimal remapping More complex client/router
────────────────────────────────────────────────────────
CROSS-SHARD OPERATIONS (expensive — avoid in hot path)
JOIN across shards → denormalize or scatter-gather + app merge
Global uniqueness → Snowflake/UUID; not DB sequence
Aggregations → precompute, rollups, or OLAP store
Resharding → dual-write or Vitess VReplication; plan early
TOOLS: Vitess (MySQL), Citus (PostgreSQL), MongoDB sharded cluster,
DynamoDB partition key + sort key, Cassandra partition key
⚖️ Trade-off
L6
"We'll use PostgreSQL" is L5. L6: "PostgreSQL for orders with row-level locking on stock;
Redis cache-aside for catalog (95% read); Elasticsearch via Kafka CDC for search—2s staleness SLA.
Shard by tenant_id when write QPS exceeds 20K."
⚠️ Pitfall
Picking Cassandra because "it's web scale" without write-heavy access patterns is a red flag.
Walk the five questions aloud; eliminate families before naming a product.
Interview Cheat Sheet
RADIO framework, time-boxed 45-minute clock, estimation script, architecture checklist, red/green flags,
and eight canonical case study one-liners. Copy the whole block before your mock or live interview.
Interview — copy blocks
RADIO template
RADIO — UNIVERSAL SYSTEM DESIGN FRAMEWORK
────────────────────────────────────────────────────────
R REQUIREMENTS
Functional: core features, user flows, MVP vs future
Non-functional: scale (DAU/QPS), latency (p99 target), availability,
consistency, durability, security, cost
Out of scope: explicitly defer (e.g., ML ranking v2, multi-region v1)
Ask: "Who are the users? Read-heavy or write-heavy? Strong consistency needed?"
A ARCHITECTURE
Draw boxes: Client → CDN → LB → API (stateless) → Cache → DB → Queue → Workers
Label: stateless vs stateful components; sync vs async paths
Data flow: write path vs read path (often different!)
Say: "API tier is stateless; session in Redis; DB is source of truth"
D DATA MODEL
Schema: entities, relationships, indexes for access patterns
Shard key: if scale requires it—justify cardinality and locality
Storage estimate: rows × bytes × retention (state assumptions)
Say: "Index on (user_id, created_at DESC) for timeline query"
I INTERFACE (API)
Key endpoints: REST/gRPC; request/response shape
Pagination: cursor-based for feeds (not offset at scale)
Idempotency: Idempotency-Key header for writes/retries
Rate limits: per-user and global; 429 + Retry-After header
O OPTIMIZATIONS & TRADE-OFFS
Bottlenecks: identify from estimation (DB, fan-out, hot keys)
Caching: what, where, TTL, invalidation, stampede prevention
Async: queue for slow path (email, transcode, index build)
Scale: sharding, read replicas, CDN, horizontal pods
Close: "We traded X for Y because product requires Z"
45-minute interview clock
45-MINUTE INTERVIEW CLOCK — TIME-BOX RUTHLESSLY
────────────────────────────────────────────────────────
Phase Time What to deliver
────────────────────────────────────────────────────────
Requirements 0–5 min Clarify scope, DAU/QPS, latency, consistency,
explicit out-of-scope. ASK QUESTIONS.
Estimation 5–10 min QPS, storage, bandwidth, server count.
State assumptions aloud. Round to 1 significant figure.
High-level design 10–20 min Boxes + arrows: client, LB, API, cache, DB, queue.
Explain read path AND write path separately.
Deep dive 20–30 min Interviewer picks: schema, cache, fan-out, consistency,
API, failure modes. Go deep on ONE area.
Scale & wrap 30–45 min 10× traffic plan, SPOFs, monitoring, trade-offs summary.
"If I had another 15 min, I'd detail X."
────────────────────────────────────────────────────────
TIME PRESSURE TACTICS
• At 10 min without a diagram → stop requirements, start drawing
• At 25 min without deep dive → pick your strongest component voluntarily
• At 40 min → summarize trade-offs even if incomplete
• Never silent > 15 seconds — narrate thinking process
LEVEL EXPECTATIONS
L4: Happy path, basic components, one scaling knob
L5: Trade-offs, failure modes, data model justification
L6: Cross-cutting concerns, operability, measured numbers, "when NOT to"
Principal: Platform strategy, org implications, multi-year evolution
Estimation template (say aloud)
ESTIMATION SCRIPT — FILL IN THE BLANKS
────────────────────────────────────────────────────────
"Let me state assumptions before I calculate:
• Daily active users: _______
• Operations per user per day: _______
• Read : write ratio: _______
• Average payload size: _______ KB
• Retention period: _______
• Peak traffic multiplier: _______× average (I'll use 3× unless you say otherwise)"
READ QPS
= DAU × reads_per_user / 100,000 seconds
Peak read QPS = avg × peak_factor
WRITE QPS
= DAU × writes_per_user / 100,000 seconds
Peak write QPS = avg × peak_factor
STORAGE
= DAU × writes_per_user × bytes_per_record × retention_days
With indexes: multiply by 2–3×
BANDWIDTH
= peak_read_QPS × response_size_bytes (+ upload for write-heavy)
SERVERS
= peak_QPS / 1,000 RPS per app server × 1.3 headroom
SANITY CHECK
"At _______ peak QPS, this is roughly _______ scale—I'd expect _______ pattern
(cache / sharding / fan-out on write / CDN)."
EXAMPLE (say it):
"300M DAU, 10 timeline reads/day, peak 3× → 30K avg, ~90K peak read QPS.
0.5 tweets/user/day → 1.5K write avg, ~4.5K peak. That drives fan-out strategy."
Architecture checklist
ARCHITECTURE CHECKLIST — BEFORE YOU SAY "DONE"
────────────────────────────────────────────────────────
SCALE
□ QPS estimated (read + write separately)
□ Storage estimated with retention
□ Horizontal scaling path identified
□ Hot key / hot partition risk addressed
RELIABILITY
□ Single points of failure named + mitigated
□ Redundancy: active-passive or active-active justified
□ Failover tested (not just "we have a replica")
□ Circuit breakers / bulkheads for downstream failures
□ Retry with exponential backoff + jitter (not blind retry)
□ Idempotency keys on mutating APIs
PERFORMANCE
□ Caching layer: what, TTL, invalidation, stampede prevention
□ CDN for static/media at edge
□ DB indexes match access patterns (no table scans on hot path)
□ Async for slow path (notifications, transcoding, indexing)
□ p99 latency target stated (not just "low latency")
DATA
□ Consistency model per operation (not one-size-fits-all)
□ Shard key chosen with cardinality + locality rationale
□ Backup + disaster recovery mentioned for durable data
OBSERVABILITY
□ Four golden signals: latency, traffic, errors, saturation
□ SLO defined (e.g., 99.9% < 200 ms over 30 days)
□ Alerting on SLO burn rate, not just threshold
SECURITY (brief — don't over-index unless asked)
□ AuthN/AuthZ at API gateway
□ Rate limiting / abuse prevention
□ TLS in transit; encryption at rest for PII
Red flags & green flags
INTERVIEW RED FLAGS (avoid) GREEN FLAGS (demonstrate)
────────────────────────────────────────────────────────────────────────
Draw before clarifying requirements Ask 3–5 scoping questions first
Silent for 30+ seconds while drawing Think aloud continuously
"We'll use microservices" day one Start monolith; split when measured pain
"We need low latency" (no number) "p99 < 200 ms; timeline can do 500 ms"
Strong consistency everywhere Match consistency to operation
One database for everything Polyglot with justified primary store
Cache without invalidation plan TTL + event-driven invalidation
Ignore failure modes Name SPOFs + circuit breakers
Skip estimation Back-of-envelope in first 10 minutes
Buzzwords without trade-offs "We chose X over Y because Z"
No out-of-scope boundaries "V1 excludes multi-region; here's why"
Resume-driven architecture Solve the stated problem first
Offset pagination at billion rows Cursor-based keyset pagination
Sticky sessions without fallback Externalized session in Redis
2PC across microservices Saga / outbox / idempotent compensations
"Redis as primary DB" no persistence Redis with AOF/RDB + replication for durability
────────────────────────────────────────────────────────────────────────
PHRASES THAT SIGNAL SENIOR LEVEL
"At 10× traffic, the bottleneck moves from _______ to _______."
"We fail closed on payment; we fail open on analytics."
"Error budget at 99.9% gives us 43 min/month—we spend it on launches."
"I'd load-test with open-loop fixed RPS to find real breaking points."
"Shard key is tenant_id—high cardinality, query-local, rebalances per tenant."
8 case study one-liners
8 CANONICAL CASE STUDIES — PROBLEM, SCALE, KEY DECISION
────────────────────────────────────────────────────────────────────────
1. URL SHORTENER (L4)
Problem: Map long URL → short code; redirect; optional click analytics
Scale: 100M URLs, 1000:1 read:write, 100K redirect RPS peak
Key decision: Base62 hash vs counter (Snowflake); Redis cache hot URLs;
301 redirect; DB sharded by short_code hash
2. RATE LIMITER (L4)
Problem: Throttle requests per user/IP/API key; sliding or token bucket
Scale: 1M RPS at edge; sub-ms check; distributed across PoPs
Key decision: Token bucket in Redis with Lua atomicity; local cache +
Redis sync for edge; 429 + Retry-After; fail-open vs closed
3. KEY-VALUE STORE (L4)
Problem: In-memory get/put/delete with optional persistence
Scale: 1B keys, 100K ops/sec, <1 ms p99
Key decision: Consistent hashing for sharding; replication factor 3;
hinted handoff; write-ahead log for durability
4. TWITTER FEED (L5)
Problem: Home timeline—tweets from followees, ranked by time
Scale: 300M DAU, 90K read QPS peak, celebrity with 50M followers
Key decision: Hybrid fan-out—write for normal users, read for celebrities;
Redis timeline cache; Snowflake tweet IDs; pull+push merge
5. YOUTUBE (L5)
Problem: Upload video, transcode, stream globally with adaptive bitrate
Scale: 500 hours uploaded/min, 1B playback hours/day
Key decision: Object storage (S3) for blobs; async transcoding queue;
CDN edge caching; DASH/HLS segments; metadata in SQL + cache
6. UBER (L5)
Problem: Real-time driver location, ride matching, surge pricing
Scale: 1M concurrent drivers, 10K rides/sec peak, geo queries
Key decision: Redis geospatial index for nearby drivers; WebSocket for
location push; matching service with supply/demand zones;
Cassandra for trip history; Kafka for event pipeline
7. WHATSAPP (L5)
Problem: 1:1 and group messaging, delivery/read receipts, offline delivery
Scale: 2B users, 100B messages/day, groups up to 256 members
Key decision: WebSocket long-lived connections; message queue per device
for offline; Cassandra for message store; end-to-end encryption;
fan-out on write for groups with large-member optimization
8. GOOGLE MAPS (L6)
Problem: Routing, real-time traffic, map tile serving globally
Scale: 1B users, petabyte road graph, sub-second route queries
Key decision: Precomputed tile CDN; graph partitioned geographically;
A* on hierarchical road network; real-time traffic via
aggregate probe data (Kafka stream); edge caching of tiles
────────────────────────────────────────────────────────────────────────
CASE STUDY PIVOT PHRASES
"The interesting part here is _______—let me go deep on that."
"At celebrity scale, fan-out on write breaks—here's the hybrid fix."
"Read path and write path differ—let me draw them separately."
🎯 Interview Tip
L5
Switch to Interview track before mocks. Copy the 45-minute clock and
estimation template blocks into a second monitor. Glance at minute marks—
interviewers notice time discipline as much as technical depth.
🏆 Senior Signal
Principal
After the architecture checklist, add org context: "This design implies a platform team owning
Kafka + schema registry; product teams publish events via SDK. Conway's law—we align service
boundaries to team boundaries."
📦 Real World
Instagram engineers report back-of-envelope estimates during design reviews catch 80% of scaling
issues before code ships. Google interviewers explicitly score structured thinking—the RADIO
skeleton matters as much as the final diagram.