Networking & Protocols

Every system design diagram eventually draws arrows between boxes. Those arrows are protocols—HTTP, gRPC, WebSockets—and the API contracts layered on top. This chapter covers how bytes move, what each protocol optimizes for, TLS overhead with real numbers, and the API design decisions (pagination, idempotency, rate limits) that survive production traffic.

L4 L5 learning HTTP/3 · gRPC · SSE

HTTP & HTTPS

HTTP is the lingua franca of the web—stateless request/response over TCP (or QUIC). Understanding version differences, TLS costs, and REST semantics is table stakes for every system design interview and every production API.

L4 HTTP models communication as a client sending a request (method, path, headers, optional body) and a server returning a response (status code, headers, body). It is inherently stateless: the server does not retain session context between requests unless you add cookies, tokens, or server-side sessions.

HTTP/1.1 — the baseline

HTTP/1.1 (1997, RFC 7230+) introduced persistent connections (Connection: keep-alive), chunked transfer encoding, and host headers enabling virtual hosting. Each request still typically gets its own sequential turn on a connection unless you open multiple parallel TCP connections (browsers cap at ~6 per host).

  • Head-of-line (HOL) blocking — a large/slow response blocks subsequent responses on the same connection
  • Header redundancy — cookies and auth tokens resent verbatim on every request (~500–800 bytes typical)
  • Text protocol — human-readable but verbose; parsing cost is negligible vs network RTT

HTTP/2 — multiplexing over one connection

HTTP/2 (2015) frames messages into binary streams multiplexed over a single TCP connection. HPACK compresses headers (~30–50% reduction on repeat requests). Server push (largely deprecated in practice) allowed proactive resource delivery.

  • Stream multiplexing — hundreds of parallel requests without opening 6+ TCP connections
  • Still TCP HOL blocking — one lost packet stalls all streams on that connection (TCP retransmission)
  • TLS ALPN — negotiated during handshake; h2 indicates HTTP/2
  • Typical win — 15–30% latency reduction on asset-heavy pages; smaller win on API-only JSON payloads
🔬 Under the Hood

HTTP/2 streams share flow-control windows at both TCP and HTTP layers. A slow consumer on one stream can back-pressure others if window sizes are exhausted—rare in APIs but visible in large file downloads mixed with small API calls.

HTTP/3 & QUIC — UDP-based transport

HTTP/3 runs over QUIC (Quick UDP Internet Connections)—TLS 1.3 is mandatory and integrated into the handshake. Each stream is independent; packet loss on one stream does not block others (eliminates TCP-level HOL blocking).

  • 0-RTT resumption — repeat connections skip full handshake (~1 RTT saved); replay-attack risk for non-idempotent ops
  • Connection migration — connection ID survives IP changes (mobile WiFi → cellular handoff)
  • UDP firewall/NAT — some corporate networks block UDP 443; fallback to HTTP/2 required
  • Adoption — Cloudflare, Google, Facebook serve HTTP/3; ~30% of web traffic as of 2025
flowchart LR
  C[Client]
  H1[HTTP/1.1\n6 TCP conns]
  H2[HTTP/2\n1 TCP + streams]
  H3[HTTP/3\nQUIC over UDP]
  C --> H1
  C --> H2
  C --> H3

TLS costs — what encryption actually costs

HTTPS adds TLS between TCP and HTTP. Costs split into handshake latency (RTT-bound) and per-record CPU (AES-GCM is cheap on modern hardware with AES-NI).

Phase TLS 1.2 TLS 1.3 Notes
Full handshake 2 RTT 1 RTT First connection to a host; session tickets enable resumption
Resumed handshake 1 RTT 0-RTT (optional) 0-RTT replays must be idempotent on server
CPU overhead ~1–3% throughput ~1–3% throughput Negligible vs JSON serialization at typical API sizes
Latency added +50–150 ms cross-region +25–100 ms Dominates small payload APIs; connection pooling amortizes
📐 Estimation

API at 50 ms p99 internal processing + 80 ms cross-AZ RTT + 50 ms TLS handshake (cold) = 180 ms first request. With keep-alive and session resumption, subsequent requests drop to ~130 ms. Always mention connection pooling when discussing microservice-to-microservice latency.

REST — architectural style, not a protocol

REST (Representational State Transfer) maps resources to URLs and uses HTTP methods semantically: GET (read, safe, idempotent), POST (create), PUT (replace, idempotent), PATCH (partial update), DELETE (remove, idempotent).

  • Statelessness — each request carries all context (auth, resource ID)
  • HATEOAS — hypermedia links in responses (rarely implemented fully; pragmatic REST stops at nouns + verbs)
  • Content negotiationAccept: application/json vs application/protobuf

HTTP status codes — the vocabulary of outcomes

Code Meaning When to use Client action
200 OK Success with body GET, PUT, PATCH success Parse response
201 Created Resource created POST success; include Location header Follow Location or use returned ID
204 No Content Success, empty body DELETE, PUT with no return payload None
400 Bad Request Client sent invalid input Validation failure, malformed JSON Fix request; do not retry blindly
401 Unauthorized Missing/invalid auth No token or expired token Refresh credentials
403 Forbidden Authenticated but not allowed RBAC denial Do not retry
404 Not Found Resource does not exist Unknown ID Do not retry
409 Conflict State conflict Duplicate create, version mismatch Resolve conflict or fetch latest
429 Too Many Requests Rate limited Quota exceeded Backoff using Retry-After
500 Internal Server Error Server bug/unhandled exception Unexpected failure Retry with exponential backoff
502/503/504 Gateway/upstream/timeout LB or dependency down, overload Retry; circuit breaker on client
🎯 Interview Tip

When designing APIs, state which status codes you'll return for each failure mode. Interviewers probe: "User submits payment twice—what happens?" Answer: idempotency key + 409 or same 200 with original receipt.

📦 Real World

Stripe returns structured error objects with type, code, and param fields— not just status codes. Google APIs use google.rpc.Status protobuf with rich error details in gRPC and HTTP/JSON transcoding.

gRPC

gRPC is a high-performance RPC framework: Protocol Buffers for schema and serialization, HTTP/2 for transport, and first-class streaming. It's the default inter-service protocol at Google, Netflix, Square, and most Kubernetes-native stacks.

L5 gRPC generates strongly-typed client/server stubs from .proto files. Contracts are enforced at compile time— breaking changes require explicit schema evolution (field numbers, reserved tags).

Protocol Buffers — schema-first serialization

Protobuf encodes typed fields as tag-length-value on the wire. Compared to JSON:

  • Size — 3–10× smaller (no field names on wire; varint encoding)
  • Speed — 5–10× faster serialize/deserialize in benchmarks (language-dependent)
  • Schema evolution — add optional fields; never reuse field numbers; use reserved
  • Human readability — poor; use JSON transcoding for browser/debug clients
syntax = "proto3";

service OrderService {
  rpc GetOrder(GetOrderRequest) returns (Order);
  rpc ListOrders(ListOrdersRequest) returns (stream Order);
  rpc SubmitOrder(stream OrderLine) returns (OrderSummary);
  rpc SyncOrders(stream OrderEvent) returns (stream OrderAck);
}

message Order {
  string order_id = 1;
  int64 created_at_ms = 2;
  repeated OrderLine lines = 3;
}

HTTP/2 as transport

gRPC maps each RPC to an HTTP/2 stream. Method path: /package.Service/Method. Status codes map to gRPC status codes (e.g. HTTP 200 with trailers carrying grpc-status: 0 for OK). Metadata (headers) carry auth tokens, tracing context (W3C traceparent), and deadlines.

Streaming types

Pattern Client → Server Server → Client Use case
Unary 1 message 1 message Standard CRUD, request/response APIs
Server streaming 1 message N messages Large result sets, live feeds, file download chunks
Client streaming N messages 1 message Upload aggregation, batch metric ingestion
Bidirectional streaming N messages N messages Chat, collaborative editing, real-time sync

Performance numbers

  • Unary latency — comparable to REST+JSON when connection is warm; wins on payload size > 1 KB
  • Throughput — 10K–100K RPS per core typical for simple unary (vs 3K–15K for JSON REST)
  • Deadline propagationcontext deadline cancels upstream work; REST needs custom timeout headers
  • Load balancing — L7 LB must understand gRPC (long-lived HTTP/2 connections); use xDS, linkerd, or gRPC-LB
⚖️ Trade-off

gRPC internal, REST external is the dominant pattern: browsers and third-party integrators get JSON/REST; service mesh traffic uses gRPC. GraphQL at the edge is an alternative when clients need flexible field selection.

Limitations

  • Browser support — requires gRPC-Web proxy (Envoy) for browsers; not native
  • Debugging — binary payloads need grpcurl or grpcui; harder than curl
  • CDN caching — POST-based RPCs don't cache at edge; REST GET does
  • Sticky connections — HTTP/2 connection reuse complicates L4 round-robin; need L7 or client-side LB
  • Schema rigidity — proto changes require coordinated rollout; JSON is more forgiving (dangerously so)
🏆 Senior Signal

"We'll use gRPC between services" is L4. "Order service exposes unary GetOrder and server-streaming ListOrders; Envoy sidecar handles mTLS and retries; protobuf schema versioned with buf breaking-change detection; public API remains REST with OpenAPI" is L6.

WebSockets

WebSockets upgrade an HTTP connection to a full-duplex, persistent channel—both client and server can push frames anytime. Essential for chat, gaming, collaborative docs, and live dashboards where server-initiated updates dominate.

Handshake and framing

Client sends HTTP upgrade request (Upgrade: websocket, Connection: Upgrade, Sec-WebSocket-Key). Server responds 101 Switching Protocols. After upgrade, communication uses lightweight binary/text frames—not HTTP.

sequenceDiagram
  participant C as Client
  participant S as Server
  C->>S: GET /ws HTTP/1.1 Upgrade: websocket
  S->>C: 101 Switching Protocols
  C->>S: WebSocket frames (bidirectional)
  S->>C: Push events anytime

Full-duplex semantics

  • Low overhead per message — 2–14 byte frame header vs full HTTP request per push
  • No request correlation — need application-level message IDs or channels
  • Back-pressure — TCP buffers can grow if consumer is slow; implement application flow control
  • Stateful connections — each socket tied to a server process; scaling is hard

Scaling WebSockets — the hard problem

Unlike stateless HTTP, a WebSocket connection lives on one server. When User A (on Server 1) messages User B (on Server 2), you need cross-server message routing.

Sticky sessions (session affinity)

L7 load balancer routes the same client IP/cookie to the same backend. Simple but fragile: server restart drops all connections; uneven load if some users are "heavy chatters."

Redis Pub/Sub (or Kafka) fan-out

Each server subscribes to channels (e.g. room:123). When a message arrives on Server 1, it publishes to Redis; all servers subscribed to that room receive it and push to their local connected clients.

// Server 1: user sends message
ws.on("message", (raw) => {
  const msg = JSON.parse(raw);
  redis.publish(`room:${msg.roomId}`, JSON.stringify(msg));
});

// All servers: relay to local sockets
redis.subscribe(`room:${roomId}`);
redis.on("message", (_, payload) => {
  localConnections.get(roomId)?.forEach((ws) => ws.send(payload));
});

Heartbeat and connection lifecycle

  • Ping/pong frames — RFC 6455 ping every 30–60 s detects dead peers
  • Idle timeouts — LBs (AWS ALB default 60 s) and CDNs kill silent connections; send application heartbeats
  • Graceful shutdown — on deploy, stop accepting new WS, send close frame, drain existing (see load-balancing chapter)
  • Reconnection — client exponential backoff + resume token to catch missed messages from buffer/Kafka
⚠️ Pitfall

Redis Pub/Sub is fire-and-forget—no persistence. If no subscriber is connected when message publishes, it's lost. For chat history, persist to DB/Kafka first, then fan-out. Discord uses Cassandra + Redis + custom routing.

📦 Real World

Slack migrated from HTTP long-polling to WebSockets with regional edge gateways. Figma uses CRDTs over WebSockets for collaborative editing—ordering and conflict resolution at application layer.

Server-Sent Events (SSE)

SSE is a one-way server→client stream over plain HTTP. Simpler than WebSockets when you only need push notifications, live feeds, or progress updates—and it works through most proxies and HTTP/2 infrastructure.

Protocol mechanics

Client opens GET /events with Accept: text/event-stream. Server keeps connection open and sends text frames delimited by double newlines. Fields: data:, event:, id:, retry:. Browser EventSource API auto-reconnects with last id.

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

id: 42
event: price-update
data: {"symbol":"AAPL","price":189.50}

id: 43
data: {"symbol":"GOOG","price":141.20}

Use cases where SSE wins

  • Stock tickers, sports scores — server pushes, client displays
  • LLM token streaming — ChatGPT-style text generation (one direction)
  • Build/deploy logs — CI pipeline output streaming to browser
  • Notification feeds — simpler than WS when no client→server stream needed

HTTP/2 connection limits

Browsers limit concurrent connections per host (~6 for HTTP/1.1, ~100 streams for HTTP/2 but practical limits lower). Each SSE connection holds one stream. Opening SSE for 10 tabs × 5 streams = resource exhaustion.

  • Multiplex SSE over one connection — single EventSource, demux by event type in client
  • HTTP/2 server push deprecated — don't rely on it; SSE remains the pattern
  • HTTP/3 — independent streams reduce HOL blocking for mixed API + SSE traffic
⚖️ Trade-off

SSE is unidirectional and text-only (UTF-8). WebSockets support binary (Protobuf, audio frames). Choose SSE when simplicity, auto-reconnect, and HTTP compatibility matter; WebSockets when you need binary or client push.

💡 Pro Tip

SSE works through HTTP/2 and most corporate proxies (it's just a long GET). WebSocket upgrades are sometimes blocked. For AI streaming APIs, SSE + fetch for the POST prompt is the 2024+ default pattern.

Long Polling vs Short Polling vs WebSocket vs SSE

"How do we get real-time updates?" is a classic interview question. The answer depends on directionality, latency requirements, infrastructure constraints, and scale—not a default of "use WebSockets."

Short polling

Client requests every N seconds: GET /messages?since=timestamp. Simple, stateless, works everywhere. Wasteful: empty responses burn QPS; worst-case latency = poll interval.

Long polling

Server holds request open until data arrives or timeout (30–60 s), then client immediately reconnects. Reduces empty responses vs short polling. Still one request per event batch; HTTP overhead per push.

Criterion Short polling Long polling SSE WebSocket
Direction Client pull Client pull (simulated push) Server → client Bidirectional
Latency 0 – poll interval ~instant after event ~instant ~instant
Connection overhead High (repeated handshakes) Medium Low (one long connection) Lowest per message
Server state Stateless Pending request map Open stream per client Stateful socket
Scaling difficulty Easy (stateless) Medium Medium Hard (sticky + pub/sub)
Proxy/LB friendly Excellent Good (watch timeouts) Good Moderate (upgrade required)
Binary data Via HTTP body Via HTTP body No (text only) Yes
Auto reconnect N/A Manual Built-in (EventSource) Manual
Best for Low-frequency updates Legacy compat, moderate realtime Feeds, LLM streams, notifications Chat, games, collaboration
🎯 Interview Tip

For "design a notification system," start with requirements: "Do clients need to send data back over the same channel?" If no → SSE. If yes (typing indicators) → WebSocket. If updates are every 30 s → short polling is fine. Always quantify: "10M concurrent SSE connections × 4 KB buffer = 40 GB memory just for socket buffers."

API Design

Protocol choice is half the story. Production APIs need versioning that doesn't break clients, pagination that scales, rate limits clients can respect, idempotency for safe retries, and gateway patterns that centralize cross-cutting concerns.

Versioning strategies

Strategy Example Pros Cons
URL path /v1/orders Explicit, easy to route at gateway URL proliferation; resource duplication
Header Accept-Version: 2024-01-15 Clean URLs; Stripe-style date versions Harder to test in browser; cache complexity
Query param ?api-version=2 Simple Easy to forget; not RESTful purist
Content type application/vnd.myapi.v2+json Hypermedia-friendly Verbose; poor tooling support
🏆 Senior Signal

Versioning policy: "We add fields, never remove. Breaking changes get a new major version with 12-month sunset. Deprecation headers (Sunset, Deprecation) on old endpoints." This is what interviewers want—not just "/v1".

Pagination — offset vs cursor vs keyset

Offset pagination

GET /users?offset=100&limit=20 — skip N rows. Simple but O(offset) on DB (deep pages scan millions of rows). Inconsistent under concurrent inserts/deletes (duplicates or gaps between page fetches).

Cursor pagination

GET /users?cursor=eyJpZCI6MTIzfQ&limit=20 — opaque token encoding last-seen position. Stable under inserts if cursor is tied to sort key. Twitter/GraphQL Relay connections use this pattern.

Keyset (seek) pagination

GET /users?after_id=123&limit=20 with ORDER BY id — DB uses index seek: WHERE id > 123 LIMIT 20. O(1) per page regardless of depth. Requires sort column in index.

Method Deep page cost Consistency Jump to page N
Offset O(offset) — bad Poor under writes Yes
Cursor O(limit) — good Good No
Keyset O(limit) — good Good (with tie-breaker column) No

Rate limit headers

Clients need machine-readable quota state to backoff gracefully. Common patterns:

  • X-RateLimit-Limit: 1000 — quota per window
  • X-RateLimit-Remaining: 742 — requests left
  • X-RateLimit-Reset: 1717772400 — Unix timestamp when window resets
  • Retry-After: 30 — seconds to wait on 429
  • IETF draftRateLimit-Limit, RateLimit-Remaining, RateLimit-Reset (standardizing)

Idempotency keys

Network retries cause duplicate side effects (double charge, double order). Client sends Idempotency-Key: uuid-v4 on POST. Server stores key → response mapping (Redis, 24 h TTL). Duplicate key returns cached response without re-executing.

POST /v1/payments HTTP/1.1
Idempotency-Key: 7c9e6679-7425-40de-944b-e07fc1f90ae7
Content-Type: application/json

{"amount": 4999, "currency": "usd", "customer": "cus_123"}
⚠️ Pitfall

Idempotency keys must be scoped per endpoint + tenant. Same key on different endpoints should not collide. Store the response, not just "seen"—client retry mid-flight needs the final result or 409 Conflict.

API Gateway

Single entry point for external clients. Responsibilities: TLS termination, auth (JWT/OAuth validation), rate limiting, request routing, response aggregation, protocol translation (REST → gRPC), WAF, analytics.

  • Kong, AWS API Gateway, Apigee, Envoy Gateway — productized gateways
  • BFF (Backend for Frontend) — one gateway per client type (mobile vs web) tailoring payloads
  • Don't put business logic in gateway—auth, routing, cross-cutting only; logic belongs in services

GraphQL — N+1 problem and DataLoader

GraphQL lets clients request nested graphs: { user { friends { name } } }. Naive resolvers issue 1 query for user + N queries for each friend's data = N+1 problem.

DataLoader batches and caches within a single request: collect all friend IDs during field resolution, one WHERE id IN (...) query, map results back. Per-request cache prevents duplicate loads.

const friendLoader = new DataLoader(async (ids) => {
  const rows = await db.query("SELECT * FROM users WHERE id = ANY($1)", [ids]);
  const map = new Map(rows.map((r) => [r.id, r]));
  return ids.map((id) => map.get(id));
});

// Resolver — DataLoader batches concurrent calls in same tick
friends: (user) => friendLoader.loadMany(user.friendIds)
⚖️ Trade-off

GraphQL — flexible clients, one round-trip; complex caching (no HTTP cache per URL), query cost attacks, harder versioning. REST — simple caching, CDN-friendly; over/under-fetching. Use GraphQL when many client types need different field sets (Facebook, GitHub); REST for public APIs and microservice internals.

📦 Real World

GitHub exposes REST v3 and GraphQL v4—GraphQL for flexible integrations, REST for simple CRUD. Netflix Falcor (predecessor to GraphQL ideas) solved N+1 with batching. Shopify rate-limits GraphQL by query cost, not just request count.