Networking & Protocols
Every system design diagram eventually draws arrows between boxes. Those arrows are protocols—HTTP, gRPC, WebSockets—and the API contracts layered on top. This chapter covers how bytes move, what each protocol optimizes for, TLS overhead with real numbers, and the API design decisions (pagination, idempotency, rate limits) that survive production traffic.
HTTP & HTTPS
HTTP is the lingua franca of the web—stateless request/response over TCP (or QUIC). Understanding version differences, TLS costs, and REST semantics is table stakes for every system design interview and every production API.
L4 HTTP models communication as a client sending a request (method, path, headers, optional body) and a server returning a response (status code, headers, body). It is inherently stateless: the server does not retain session context between requests unless you add cookies, tokens, or server-side sessions.
HTTP/1.1 — the baseline
HTTP/1.1 (1997, RFC 7230+) introduced persistent connections (Connection: keep-alive),
chunked transfer encoding, and host headers enabling virtual hosting. Each request still typically gets its own
sequential turn on a connection unless you open multiple parallel TCP connections (browsers cap at ~6 per host).
- Head-of-line (HOL) blocking — a large/slow response blocks subsequent responses on the same connection
- Header redundancy — cookies and auth tokens resent verbatim on every request (~500–800 bytes typical)
- Text protocol — human-readable but verbose; parsing cost is negligible vs network RTT
HTTP/2 — multiplexing over one connection
HTTP/2 (2015) frames messages into binary streams multiplexed over a single TCP connection. HPACK compresses headers (~30–50% reduction on repeat requests). Server push (largely deprecated in practice) allowed proactive resource delivery.
- Stream multiplexing — hundreds of parallel requests without opening 6+ TCP connections
- Still TCP HOL blocking — one lost packet stalls all streams on that connection (TCP retransmission)
- TLS ALPN — negotiated during handshake;
h2indicates HTTP/2 - Typical win — 15–30% latency reduction on asset-heavy pages; smaller win on API-only JSON payloads
HTTP/2 streams share flow-control windows at both TCP and HTTP layers. A slow consumer on one stream can back-pressure others if window sizes are exhausted—rare in APIs but visible in large file downloads mixed with small API calls.
HTTP/3 & QUIC — UDP-based transport
HTTP/3 runs over QUIC (Quick UDP Internet Connections)—TLS 1.3 is mandatory and integrated into the handshake. Each stream is independent; packet loss on one stream does not block others (eliminates TCP-level HOL blocking).
- 0-RTT resumption — repeat connections skip full handshake (~1 RTT saved); replay-attack risk for non-idempotent ops
- Connection migration — connection ID survives IP changes (mobile WiFi → cellular handoff)
- UDP firewall/NAT — some corporate networks block UDP 443; fallback to HTTP/2 required
- Adoption — Cloudflare, Google, Facebook serve HTTP/3; ~30% of web traffic as of 2025
flowchart LR C[Client] H1[HTTP/1.1\n6 TCP conns] H2[HTTP/2\n1 TCP + streams] H3[HTTP/3\nQUIC over UDP] C --> H1 C --> H2 C --> H3
TLS costs — what encryption actually costs
HTTPS adds TLS between TCP and HTTP. Costs split into handshake latency (RTT-bound) and per-record CPU (AES-GCM is cheap on modern hardware with AES-NI).
| Phase | TLS 1.2 | TLS 1.3 | Notes |
|---|---|---|---|
| Full handshake | 2 RTT | 1 RTT | First connection to a host; session tickets enable resumption |
| Resumed handshake | 1 RTT | 0-RTT (optional) | 0-RTT replays must be idempotent on server |
| CPU overhead | ~1–3% throughput | ~1–3% throughput | Negligible vs JSON serialization at typical API sizes |
| Latency added | +50–150 ms cross-region | +25–100 ms | Dominates small payload APIs; connection pooling amortizes |
API at 50 ms p99 internal processing + 80 ms cross-AZ RTT + 50 ms TLS handshake (cold) = 180 ms first request. With keep-alive and session resumption, subsequent requests drop to ~130 ms. Always mention connection pooling when discussing microservice-to-microservice latency.
REST — architectural style, not a protocol
REST (Representational State Transfer) maps resources to URLs and uses HTTP methods semantically:
GET (read, safe, idempotent), POST (create), PUT (replace, idempotent),
PATCH (partial update), DELETE (remove, idempotent).
- Statelessness — each request carries all context (auth, resource ID)
- HATEOAS — hypermedia links in responses (rarely implemented fully; pragmatic REST stops at nouns + verbs)
- Content negotiation —
Accept: application/jsonvsapplication/protobuf
HTTP status codes — the vocabulary of outcomes
| Code | Meaning | When to use | Client action |
|---|---|---|---|
200 OK |
Success with body | GET, PUT, PATCH success | Parse response |
201 Created |
Resource created | POST success; include Location header |
Follow Location or use returned ID |
204 No Content |
Success, empty body | DELETE, PUT with no return payload | None |
400 Bad Request |
Client sent invalid input | Validation failure, malformed JSON | Fix request; do not retry blindly |
401 Unauthorized |
Missing/invalid auth | No token or expired token | Refresh credentials |
403 Forbidden |
Authenticated but not allowed | RBAC denial | Do not retry |
404 Not Found |
Resource does not exist | Unknown ID | Do not retry |
409 Conflict |
State conflict | Duplicate create, version mismatch | Resolve conflict or fetch latest |
429 Too Many Requests |
Rate limited | Quota exceeded | Backoff using Retry-After |
500 Internal Server Error |
Server bug/unhandled exception | Unexpected failure | Retry with exponential backoff |
502/503/504 |
Gateway/upstream/timeout | LB or dependency down, overload | Retry; circuit breaker on client |
When designing APIs, state which status codes you'll return for each failure mode. Interviewers probe:
"User submits payment twice—what happens?" Answer: idempotency key + 409 or same 200 with original receipt.
Stripe returns structured error objects with type, code, and param fields—
not just status codes. Google APIs use google.rpc.Status protobuf with rich error details in gRPC and HTTP/JSON transcoding.
gRPC
gRPC is a high-performance RPC framework: Protocol Buffers for schema and serialization, HTTP/2 for transport, and first-class streaming. It's the default inter-service protocol at Google, Netflix, Square, and most Kubernetes-native stacks.
L5
gRPC generates strongly-typed client/server stubs from .proto files. Contracts are enforced at compile time—
breaking changes require explicit schema evolution (field numbers, reserved tags).
Protocol Buffers — schema-first serialization
Protobuf encodes typed fields as tag-length-value on the wire. Compared to JSON:
- Size — 3–10× smaller (no field names on wire; varint encoding)
- Speed — 5–10× faster serialize/deserialize in benchmarks (language-dependent)
- Schema evolution — add optional fields; never reuse field numbers; use
reserved - Human readability — poor; use JSON transcoding for browser/debug clients
syntax = "proto3";
service OrderService {
rpc GetOrder(GetOrderRequest) returns (Order);
rpc ListOrders(ListOrdersRequest) returns (stream Order);
rpc SubmitOrder(stream OrderLine) returns (OrderSummary);
rpc SyncOrders(stream OrderEvent) returns (stream OrderAck);
}
message Order {
string order_id = 1;
int64 created_at_ms = 2;
repeated OrderLine lines = 3;
}
HTTP/2 as transport
gRPC maps each RPC to an HTTP/2 stream. Method path: /package.Service/Method.
Status codes map to gRPC status codes (e.g. HTTP 200 with trailers carrying grpc-status: 0 for OK).
Metadata (headers) carry auth tokens, tracing context (W3C traceparent), and deadlines.
Streaming types
| Pattern | Client → Server | Server → Client | Use case |
|---|---|---|---|
| Unary | 1 message | 1 message | Standard CRUD, request/response APIs |
| Server streaming | 1 message | N messages | Large result sets, live feeds, file download chunks |
| Client streaming | N messages | 1 message | Upload aggregation, batch metric ingestion |
| Bidirectional streaming | N messages | N messages | Chat, collaborative editing, real-time sync |
Performance numbers
- Unary latency — comparable to REST+JSON when connection is warm; wins on payload size > 1 KB
- Throughput — 10K–100K RPS per core typical for simple unary (vs 3K–15K for JSON REST)
- Deadline propagation —
context deadlinecancels upstream work; REST needs custom timeout headers - Load balancing — L7 LB must understand gRPC (long-lived HTTP/2 connections); use xDS, linkerd, or gRPC-LB
gRPC internal, REST external is the dominant pattern: browsers and third-party integrators get JSON/REST; service mesh traffic uses gRPC. GraphQL at the edge is an alternative when clients need flexible field selection.
Limitations
- Browser support — requires gRPC-Web proxy (Envoy) for browsers; not native
- Debugging — binary payloads need grpcurl or grpcui; harder than curl
- CDN caching — POST-based RPCs don't cache at edge; REST GET does
- Sticky connections — HTTP/2 connection reuse complicates L4 round-robin; need L7 or client-side LB
- Schema rigidity — proto changes require coordinated rollout; JSON is more forgiving (dangerously so)
"We'll use gRPC between services" is L4. "Order service exposes unary GetOrder and server-streaming ListOrders; Envoy sidecar handles mTLS and retries; protobuf schema versioned with buf breaking-change detection; public API remains REST with OpenAPI" is L6.
WebSockets
WebSockets upgrade an HTTP connection to a full-duplex, persistent channel—both client and server can push frames anytime. Essential for chat, gaming, collaborative docs, and live dashboards where server-initiated updates dominate.
Handshake and framing
Client sends HTTP upgrade request (Upgrade: websocket, Connection: Upgrade, Sec-WebSocket-Key).
Server responds 101 Switching Protocols. After upgrade, communication uses lightweight binary/text frames—not HTTP.
sequenceDiagram participant C as Client participant S as Server C->>S: GET /ws HTTP/1.1 Upgrade: websocket S->>C: 101 Switching Protocols C->>S: WebSocket frames (bidirectional) S->>C: Push events anytime
Full-duplex semantics
- Low overhead per message — 2–14 byte frame header vs full HTTP request per push
- No request correlation — need application-level message IDs or channels
- Back-pressure — TCP buffers can grow if consumer is slow; implement application flow control
- Stateful connections — each socket tied to a server process; scaling is hard
Scaling WebSockets — the hard problem
Unlike stateless HTTP, a WebSocket connection lives on one server. When User A (on Server 1) messages User B (on Server 2), you need cross-server message routing.
Sticky sessions (session affinity)
L7 load balancer routes the same client IP/cookie to the same backend. Simple but fragile: server restart drops all connections; uneven load if some users are "heavy chatters."
Redis Pub/Sub (or Kafka) fan-out
Each server subscribes to channels (e.g. room:123). When a message arrives on Server 1, it publishes to Redis;
all servers subscribed to that room receive it and push to their local connected clients.
// Server 1: user sends message
ws.on("message", (raw) => {
const msg = JSON.parse(raw);
redis.publish(`room:${msg.roomId}`, JSON.stringify(msg));
});
// All servers: relay to local sockets
redis.subscribe(`room:${roomId}`);
redis.on("message", (_, payload) => {
localConnections.get(roomId)?.forEach((ws) => ws.send(payload));
});
Heartbeat and connection lifecycle
- Ping/pong frames — RFC 6455 ping every 30–60 s detects dead peers
- Idle timeouts — LBs (AWS ALB default 60 s) and CDNs kill silent connections; send application heartbeats
- Graceful shutdown — on deploy, stop accepting new WS, send close frame, drain existing (see load-balancing chapter)
- Reconnection — client exponential backoff + resume token to catch missed messages from buffer/Kafka
Redis Pub/Sub is fire-and-forget—no persistence. If no subscriber is connected when message publishes, it's lost. For chat history, persist to DB/Kafka first, then fan-out. Discord uses Cassandra + Redis + custom routing.
Slack migrated from HTTP long-polling to WebSockets with regional edge gateways. Figma uses CRDTs over WebSockets for collaborative editing—ordering and conflict resolution at application layer.
Server-Sent Events (SSE)
SSE is a one-way server→client stream over plain HTTP. Simpler than WebSockets when you only need push notifications, live feeds, or progress updates—and it works through most proxies and HTTP/2 infrastructure.
Protocol mechanics
Client opens GET /events with Accept: text/event-stream. Server keeps connection open and sends
text frames delimited by double newlines. Fields: data:, event:, id:, retry:.
Browser EventSource API auto-reconnects with last id.
HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive
id: 42
event: price-update
data: {"symbol":"AAPL","price":189.50}
id: 43
data: {"symbol":"GOOG","price":141.20}
Use cases where SSE wins
- Stock tickers, sports scores — server pushes, client displays
- LLM token streaming — ChatGPT-style text generation (one direction)
- Build/deploy logs — CI pipeline output streaming to browser
- Notification feeds — simpler than WS when no client→server stream needed
HTTP/2 connection limits
Browsers limit concurrent connections per host (~6 for HTTP/1.1, ~100 streams for HTTP/2 but practical limits lower). Each SSE connection holds one stream. Opening SSE for 10 tabs × 5 streams = resource exhaustion.
- Multiplex SSE over one connection — single EventSource, demux by event type in client
- HTTP/2 server push deprecated — don't rely on it; SSE remains the pattern
- HTTP/3 — independent streams reduce HOL blocking for mixed API + SSE traffic
SSE is unidirectional and text-only (UTF-8). WebSockets support binary (Protobuf, audio frames). Choose SSE when simplicity, auto-reconnect, and HTTP compatibility matter; WebSockets when you need binary or client push.
SSE works through HTTP/2 and most corporate proxies (it's just a long GET). WebSocket upgrades are sometimes blocked. For AI streaming APIs, SSE + fetch for the POST prompt is the 2024+ default pattern.
Long Polling vs Short Polling vs WebSocket vs SSE
"How do we get real-time updates?" is a classic interview question. The answer depends on directionality, latency requirements, infrastructure constraints, and scale—not a default of "use WebSockets."
Short polling
Client requests every N seconds: GET /messages?since=timestamp. Simple, stateless, works everywhere.
Wasteful: empty responses burn QPS; worst-case latency = poll interval.
Long polling
Server holds request open until data arrives or timeout (30–60 s), then client immediately reconnects. Reduces empty responses vs short polling. Still one request per event batch; HTTP overhead per push.
| Criterion | Short polling | Long polling | SSE | WebSocket |
|---|---|---|---|---|
| Direction | Client pull | Client pull (simulated push) | Server → client | Bidirectional |
| Latency | 0 – poll interval | ~instant after event | ~instant | ~instant |
| Connection overhead | High (repeated handshakes) | Medium | Low (one long connection) | Lowest per message |
| Server state | Stateless | Pending request map | Open stream per client | Stateful socket |
| Scaling difficulty | Easy (stateless) | Medium | Medium | Hard (sticky + pub/sub) |
| Proxy/LB friendly | Excellent | Good (watch timeouts) | Good | Moderate (upgrade required) |
| Binary data | Via HTTP body | Via HTTP body | No (text only) | Yes |
| Auto reconnect | N/A | Manual | Built-in (EventSource) | Manual |
| Best for | Low-frequency updates | Legacy compat, moderate realtime | Feeds, LLM streams, notifications | Chat, games, collaboration |
For "design a notification system," start with requirements: "Do clients need to send data back over the same channel?" If no → SSE. If yes (typing indicators) → WebSocket. If updates are every 30 s → short polling is fine. Always quantify: "10M concurrent SSE connections × 4 KB buffer = 40 GB memory just for socket buffers."
API Design
Protocol choice is half the story. Production APIs need versioning that doesn't break clients, pagination that scales, rate limits clients can respect, idempotency for safe retries, and gateway patterns that centralize cross-cutting concerns.
Versioning strategies
| Strategy | Example | Pros | Cons |
|---|---|---|---|
| URL path | /v1/orders |
Explicit, easy to route at gateway | URL proliferation; resource duplication |
| Header | Accept-Version: 2024-01-15 |
Clean URLs; Stripe-style date versions | Harder to test in browser; cache complexity |
| Query param | ?api-version=2 |
Simple | Easy to forget; not RESTful purist |
| Content type | application/vnd.myapi.v2+json |
Hypermedia-friendly | Verbose; poor tooling support |
Versioning policy: "We add fields, never remove. Breaking changes get a new major version with 12-month sunset.
Deprecation headers (Sunset, Deprecation) on old endpoints." This is what interviewers want—not just "/v1".
Pagination — offset vs cursor vs keyset
Offset pagination
GET /users?offset=100&limit=20 — skip N rows. Simple but O(offset) on DB (deep pages scan millions of rows).
Inconsistent under concurrent inserts/deletes (duplicates or gaps between page fetches).
Cursor pagination
GET /users?cursor=eyJpZCI6MTIzfQ&limit=20 — opaque token encoding last-seen position.
Stable under inserts if cursor is tied to sort key. Twitter/GraphQL Relay connections use this pattern.
Keyset (seek) pagination
GET /users?after_id=123&limit=20 with ORDER BY id — DB uses index seek:
WHERE id > 123 LIMIT 20. O(1) per page regardless of depth. Requires sort column in index.
| Method | Deep page cost | Consistency | Jump to page N |
|---|---|---|---|
| Offset | O(offset) — bad | Poor under writes | Yes |
| Cursor | O(limit) — good | Good | No |
| Keyset | O(limit) — good | Good (with tie-breaker column) | No |
Rate limit headers
Clients need machine-readable quota state to backoff gracefully. Common patterns:
X-RateLimit-Limit: 1000— quota per windowX-RateLimit-Remaining: 742— requests leftX-RateLimit-Reset: 1717772400— Unix timestamp when window resetsRetry-After: 30— seconds to wait on429- IETF draft —
RateLimit-Limit,RateLimit-Remaining,RateLimit-Reset(standardizing)
Idempotency keys
Network retries cause duplicate side effects (double charge, double order). Client sends
Idempotency-Key: uuid-v4 on POST. Server stores key → response mapping (Redis, 24 h TTL).
Duplicate key returns cached response without re-executing.
POST /v1/payments HTTP/1.1
Idempotency-Key: 7c9e6679-7425-40de-944b-e07fc1f90ae7
Content-Type: application/json
{"amount": 4999, "currency": "usd", "customer": "cus_123"}
Idempotency keys must be scoped per endpoint + tenant. Same key on different endpoints should not collide.
Store the response, not just "seen"—client retry mid-flight needs the final result or 409 Conflict.
API Gateway
Single entry point for external clients. Responsibilities: TLS termination, auth (JWT/OAuth validation), rate limiting, request routing, response aggregation, protocol translation (REST → gRPC), WAF, analytics.
- Kong, AWS API Gateway, Apigee, Envoy Gateway — productized gateways
- BFF (Backend for Frontend) — one gateway per client type (mobile vs web) tailoring payloads
- Don't put business logic in gateway—auth, routing, cross-cutting only; logic belongs in services
GraphQL — N+1 problem and DataLoader
GraphQL lets clients request nested graphs: { user { friends { name } } }.
Naive resolvers issue 1 query for user + N queries for each friend's data = N+1 problem.
DataLoader batches and caches within a single request: collect all friend IDs during field resolution,
one WHERE id IN (...) query, map results back. Per-request cache prevents duplicate loads.
const friendLoader = new DataLoader(async (ids) => {
const rows = await db.query("SELECT * FROM users WHERE id = ANY($1)", [ids]);
const map = new Map(rows.map((r) => [r.id, r]));
return ids.map((id) => map.get(id));
});
// Resolver — DataLoader batches concurrent calls in same tick
friends: (user) => friendLoader.loadMany(user.friendIds)
GraphQL — flexible clients, one round-trip; complex caching (no HTTP cache per URL), query cost attacks, harder versioning. REST — simple caching, CDN-friendly; over/under-fetching. Use GraphQL when many client types need different field sets (Facebook, GitHub); REST for public APIs and microservice internals.
GitHub exposes REST v3 and GraphQL v4—GraphQL for flexible integrations, REST for simple CRUD. Netflix Falcor (predecessor to GraphQL ideas) solved N+1 with batching. Shopify rate-limits GraphQL by query cost, not just request count.