SysDesign Core · Capstone series

Master system design from first principles to staff level

The capstone of the sharpbyte.dev learning series—tying together Unix, Java, Spring, Microservices, Kafka, Docker, K8s, and DevSec into real-world architectural decision-making. Not boxes and arrows: why each choice is made, what breaks at scale, and how elite engineering teams think. Part of the broader System design topic—pair with product case studies and interview prompts when you are ready to apply the theory.

Browse all sections ↓ Why system design? Start with fundamentals

What is system design? Why it separates senior from junior engineers

Junior engineers implement features. Senior engineers design systems that survive traffic spikes, partial failures, and organizational change. System design is the discipline of making architectural decisions under constraints—latency, cost, consistency, team size, and time.

Interviews test this skill because production doesn't forgive naive architecture. A URL shortener that works for 100 users collapses at 100M. A Twitter feed that fan-outs on write for every user dies when Katy Perry tweets. The difference between L4 and L6 isn't knowing more patterns—it's trade-off reasoning with real numbers.

Two dimensions: breadth and depth

Great system designers operate on two axes simultaneously. Breadth without depth produces shallow diagrams. Depth without breadth produces over-engineered solutions to simple problems.

Breadth — know many patterns

CQRS, sharding, consistent hashing, saga, fan-out on write, circuit breaker, cache-aside, leader election. Recognize the pattern when you see the problem. A rate limiter needs token bucket; a feed needs hybrid fan-out.
Depth — understand why

Why Cassandra over MySQL for this write pattern? Why p99 matters more than average latency? Why 2PC fails across microservices? Why Redis SETNX locks need fencing tokens? Depth is what interviewers probe in the deep-dive phase.

🏆 Senior Signal

L5 candidates name patterns. L6 candidates explain when not to use them and quantify the cost. "We'll use Redis" is L4. "Redis cache-aside with 5-minute TTL, mutex on miss to prevent stampede, ~100K ops/sec per node—at 35K read QPS we need one cluster with headroom" is L6.

How to use this site: two parallel tracks

Toggle Learning or Interview in the nav (saved in your browser). Content sections tagged for each track appear or hide globally across every page.

Learning track

Deep understanding

Distributed systems theory, database internals, real production numbers, and how Netflix, Uber, and Discord actually solved these problems. Read sequentially or dive into weak areas.

CAP, PACELC, consistency models with nuance
Latency numbers every engineer must know
8 real-world case studies with full rationale
Patterns library with trade-offs, not just definitions

Interview track

Structured frameworks

RADIO framework, 45-minute time allocation, estimation templates, red flags / green flags, and level-specific expectations for FAANG/MANGA L4 through Principal.

Requirements → Architecture → Data → Interface → Optimizations
Back-of-envelope estimation with stated assumptions
Common follow-up questions with strong answers
Level badges on every concept (L4 / L5 / L6 / Principal)

💡 Pro Tip

Read fundamentals → databases → caching → distributed systems in order for the learning track. For interview prep, start with Interview Framework and Case Studies, then backfill theory where your deep-dives feel thin.

🎯 Interview Tip

Switch to Interview mode before mock sessions. Hide deep theory, surface frameworks and time-boxed templates. Practice saying assumptions out loud: "300M DAU, 10 reads/day, peak 3× average—that's ~35K read QPS."

System design interview levels

Problems scale with level. L4 tests fundamentals and happy path. L6 tests failure modes, cross-cutting concerns, and org-level trade-offs. Match your preparation to your target level.

L4 · Mid

Fundamentals & core components

URL Shortener — hashing, redirect, analytics, cache
Rate Limiter — token bucket, Redis, distributed limits
Key-Value Store — in-memory, persistence, sharding basics

L5 · Senior

Trade-offs & failure modes

Twitter Feed — fan-out on write vs read, hybrid model
YouTube — upload pipeline, transcoding, CDN streaming
Uber — real-time location, matching, surge pricing
WhatsApp — messaging, delivery receipts, group chat
Dropbox — file sync, chunking, conflict resolution

L6 · Staff

Cross-cutting & scale edge cases

Google Maps — graph routing, real-time traffic, tile serving
Facebook Search — inverted index at billion-doc scale
Distributed Database — consensus, partitioning, consistency
Ad Click Aggregation — stream processing, exactly-once counts
Stock Exchange — ordering, matching engine, low latency

Principal

Platform & org-level strategy

Platform design — paved roads, golden paths, self-service infra
Multi-year architecture — evolutionary architecture, strangler fig
Build vs buy — TCO, vendor lock-in, team capability
Org design implications — Conway's law, team topologies

⚖️ Trade-off

Preparing only for L4 problems leaves you underprepared for senior loops—interviewers will push into failure modes and scale. Preparing only for L6 problems wastes time if you're targeting mid-level. Use level badges throughout the guide to focus effort.

The universal framework: RADIO

Every system design answer follows the same skeleton. RADIO keeps you structured under time pressure and ensures you don't skip non-functional requirements or trade-offs.

R Requirements

Functional, non-functional, explicit out-of-scope

A Architecture

High-level components, data flow diagram

D Data model

Schema, indexes, access patterns, shard key

I Interface

Key APIs, request/response, pagination

O Optimizations

Bottlenecks, caching, sharding, trade-offs

Full templates, requirement questions, and deep-dive decision trees in Interview Framework →

The 45-minute interview template

Time-box ruthlessly. Candidates who spend 20 minutes on requirements never reach architecture. Candidates who skip estimation guess wrong on every component count.

0–5 min Requirements

Clarify functional scope, DAU/QPS, latency, consistency, out-of-scope. Ask questions.

5–10 min Estimation

Back-of-envelope: QPS, storage, bandwidth, server count. State assumptions aloud.

10–20 min High-level design

Draw boxes: client, LB, API, cache, DB, queue, workers. Explain data flow.

20–30 min Deep dive

Interviewer picks: DB schema, cache strategy, fan-out, consistency, API design.

30–45 min Scale & wrap

10× traffic, single points of failure, monitoring, trade-offs summary.

⚠️ Pitfall

Starting to draw before clarifying requirements is the #1 red flag. Passive silence while drawing is #2. Think out loud—interviewers score your reasoning process, not just the final diagram.

Prerequisite knowledge map

SysDesign Core assumes you've worked through—or can reference—these sharpbyte.dev series. System design sits on top of runtime, networking, and delivery fundamentals.

Unix Core Processes, I/O, networking syscalls, file descriptors Java Core Concurrency, JVM, collections, performance basics Spring Core DI, REST APIs, transactions, connection pooling Micro Core Microservices, service discovery, API gateways Kafka Core Event streaming, partitions, consumer groups, ordering Docker Core Containers, images, networking, resource limits K8s Core Orchestration, scaling, service mesh, health checks DevSec Core CI/CD, observability, deployment strategies, SLOs

System design mental models

Every non-functional requirement maps to one of these pillars. Great designs optimize across all eight— not maximally on one at the expense of others.

Scalability

Handle growth in users, data, and traffic—vertically or horizontally.
Reliability

Correct behavior under expected conditions; fault tolerance when components fail.
Availability

System is up when users need it—nines, MTBF, MTTR, redundancy.
Maintainability

Teams can evolve the system—operability, simplicity, extensibility.
Performance

Latency (p50/p99) and throughput—often in tension with each other.
Cost-efficiency

Dollars per request, storage tiering, right-sizing—not over-provisioning by default.
Security

AuthN/Z, encryption, rate limiting, abuse prevention, data residency.
Observability

Four golden signals, SLOs, tracing—know when and why things break.

📦 Real World

Google SRE codified the four golden signals: latency, traffic, errors, saturation. Netflix prioritizes availability and graceful degradation over strong consistency for streaming metadata. Stripe inverts that—financial correctness demands strong consistency even at latency cost.

Explore the guide — all sections

Thirteen chapters from fundamentals to cheat sheets. Recommended learning path: Fundamentals → Networking → Databases → Caching → Distributed Systems → Case Studies.

Learning path: Fundamentals · Databases · Caching · Distributed Systems · Case Studies

Interview path: Interview Framework · Case Studies · Cheat Sheets · Patterns Library

Master system design from first principles to staff level

What is system design? Why it separates senior from junior engineers

Two dimensions: breadth and depth

Breadth — know many patterns

Depth — understand why

How to use this site: two parallel tracks

Deep understanding

Structured frameworks

System design interview levels

Fundamentals & core components

Trade-offs & failure modes

Cross-cutting & scale edge cases

Platform & org-level strategy

The universal framework: RADIO

The 45-minute interview template

Prerequisite knowledge map

System design mental models

Scalability

Reliability

Availability

Maintainability

Performance

Cost-efficiency

Security

Observability

Explore the guide — all sections

System Design Fundamentals

Networking & Protocols

Load Balancing & Proxies

Database Design & Selection

Caching Strategies

Distributed Systems Theory

Messaging & Event Streaming

Storage Patterns & Data Modeling

Real System Case Studies

Interview Framework

Architecture Patterns Library

Cheat Sheets