Networking & Protocols

HTTP & HTTPS

HTTP is the lingua franca of the web—stateless request/response over TCP (or QUIC). Understanding version differences, TLS costs, and REST semantics is table stakes for every system design interview and every production API.

L4 HTTP models communication as a client sending a request (method, path, headers, optional body) and a server returning a response (status code, headers, body). It is inherently stateless: the server does not retain session context between requests unless you add cookies, tokens, or server-side sessions.

HTTP/1.1 — the baseline

HTTP/1.1 (1997, RFC 7230+) introduced persistent connections (Connection: keep-alive), chunked transfer encoding, and host headers enabling virtual hosting. Each request still typically gets its own sequential turn on a connection unless you open multiple parallel TCP connections (browsers cap at ~6 per host).

Head-of-line (HOL) blocking — a large/slow response blocks subsequent responses on the same connection
Header redundancy — cookies and auth tokens resent verbatim on every request (~500–800 bytes typical)
Text protocol — human-readable but verbose; parsing cost is negligible vs network RTT

HTTP/2 — multiplexing over one connection

HTTP/2 (2015) frames messages into binary streams multiplexed over a single TCP connection. HPACK compresses headers (~30–50% reduction on repeat requests). Server push (largely deprecated in practice) allowed proactive resource delivery.

Stream multiplexing — hundreds of parallel requests without opening 6+ TCP connections
Still TCP HOL blocking — one lost packet stalls all streams on that connection (TCP retransmission)
TLS ALPN — negotiated during handshake; h2 indicates HTTP/2
Typical win — 15–30% latency reduction on asset-heavy pages; smaller win on API-only JSON payloads

🔬 Under the Hood

HTTP/2 streams share flow-control windows at both TCP and HTTP layers. A slow consumer on one stream can back-pressure others if window sizes are exhausted—rare in APIs but visible in large file downloads mixed with small API calls.

HTTP/3 & QUIC — UDP-based transport

HTTP/3 runs over QUIC (Quick UDP Internet Connections)—TLS 1.3 is mandatory and integrated into the handshake. Each stream is independent; packet loss on one stream does not block others (eliminates TCP-level HOL blocking).

0-RTT resumption — repeat connections skip full handshake (~1 RTT saved); replay-attack risk for non-idempotent ops
Connection migration — connection ID survives IP changes (mobile WiFi → cellular handoff)
UDP firewall/NAT — some corporate networks block UDP 443; fallback to HTTP/2 required
Adoption — Cloudflare, Google, Facebook serve HTTP/3; ~30% of web traffic as of 2025

flowchart LR
  C[Client]
  H1[HTTP/1.1\n6 TCP conns]
  H2[HTTP/2\n1 TCP + streams]
  H3[HTTP/3\nQUIC over UDP]
  C --> H1
  C --> H2
  C --> H3

TLS costs — what encryption actually costs

HTTPS adds TLS between TCP and HTTP. Costs split into handshake latency (RTT-bound) and per-record CPU (AES-GCM is cheap on modern hardware with AES-NI).

Phase	TLS 1.2	TLS 1.3	Notes
Full handshake	2 RTT	1 RTT	First connection to a host; session tickets enable resumption
Resumed handshake	1 RTT	0-RTT (optional)	0-RTT replays must be idempotent on server
CPU overhead	~1–3% throughput	~1–3% throughput	Negligible vs JSON serialization at typical API sizes
Latency added	+50–150 ms cross-region	+25–100 ms	Dominates small payload APIs; connection pooling amortizes

📐 Estimation

API at 50 ms p99 internal processing + 80 ms cross-AZ RTT + 50 ms TLS handshake (cold) = 180 ms first request. With keep-alive and session resumption, subsequent requests drop to ~130 ms. Always mention connection pooling when discussing microservice-to-microservice latency.

REST — architectural style, not a protocol

REST (Representational State Transfer) maps resources to URLs and uses HTTP methods semantically: GET (read, safe, idempotent), POST (create), PUT (replace, idempotent), PATCH (partial update), DELETE (remove, idempotent).

Statelessness — each request carries all context (auth, resource ID)
HATEOAS — hypermedia links in responses (rarely implemented fully; pragmatic REST stops at nouns + verbs)
Content negotiation — Accept: application/json vs application/protobuf

HTTP status codes — the vocabulary of outcomes

Code	Meaning	When to use	Client action
`200 OK`	Success with body	GET, PUT, PATCH success	Parse response
`201 Created`	Resource created	POST success; include `Location` header	Follow Location or use returned ID
`204 No Content`	Success, empty body	DELETE, PUT with no return payload	None
`400 Bad Request`	Client sent invalid input	Validation failure, malformed JSON	Fix request; do not retry blindly
`401 Unauthorized`	Missing/invalid auth	No token or expired token	Refresh credentials
`403 Forbidden`	Authenticated but not allowed	RBAC denial	Do not retry
`404 Not Found`	Resource does not exist	Unknown ID	Do not retry
`409 Conflict`	State conflict	Duplicate create, version mismatch	Resolve conflict or fetch latest
`429 Too Many Requests`	Rate limited	Quota exceeded	Backoff using `Retry-After`
`500 Internal Server Error`	Server bug/unhandled exception	Unexpected failure	Retry with exponential backoff
`502/503/504`	Gateway/upstream/timeout	LB or dependency down, overload	Retry; circuit breaker on client

🎯 Interview Tip

When designing APIs, state which status codes you'll return for each failure mode. Interviewers probe: "User submits payment twice—what happens?" Answer: idempotency key + 409 or same 200 with original receipt.

📦 Real World

Stripe returns structured error objects with type, code, and param fields— not just status codes. Google APIs use google.rpc.Status protobuf with rich error details in gRPC and HTTP/JSON transcoding.

gRPC

gRPC is a high-performance RPC framework: Protocol Buffers for schema and serialization, HTTP/2 for transport, and first-class streaming. It's the default inter-service protocol at Google, Netflix, Square, and most Kubernetes-native stacks.

L5 gRPC generates strongly-typed client/server stubs from .proto files. Contracts are enforced at compile time— breaking changes require explicit schema evolution (field numbers, reserved tags).

Protocol Buffers — schema-first serialization

Protobuf encodes typed fields as tag-length-value on the wire. Compared to JSON:

Size — 3–10× smaller (no field names on wire; varint encoding)
Speed — 5–10× faster serialize/deserialize in benchmarks (language-dependent)
Schema evolution — add optional fields; never reuse field numbers; use reserved
Human readability — poor; use JSON transcoding for browser/debug clients

syntax = "proto3";

service OrderService {
  rpc GetOrder(GetOrderRequest) returns (Order);
  rpc ListOrders(ListOrdersRequest) returns (stream Order);
  rpc SubmitOrder(stream OrderLine) returns (OrderSummary);
  rpc SyncOrders(stream OrderEvent) returns (stream OrderAck);
}

message Order {
  string order_id = 1;
  int64 created_at_ms = 2;
  repeated OrderLine lines = 3;
}

HTTP/2 as transport

gRPC maps each RPC to an HTTP/2 stream. Method path: /package.Service/Method. Status codes map to gRPC status codes (e.g. HTTP 200 with trailers carrying grpc-status: 0 for OK). Metadata (headers) carry auth tokens, tracing context (W3C traceparent), and deadlines.

Streaming types

Pattern	Client → Server	Server → Client	Use case
Unary	1 message	1 message	Standard CRUD, request/response APIs
Server streaming	1 message	N messages	Large result sets, live feeds, file download chunks
Client streaming	N messages	1 message	Upload aggregation, batch metric ingestion
Bidirectional streaming	N messages	N messages	Chat, collaborative editing, real-time sync

Performance numbers

Unary latency — comparable to REST+JSON when connection is warm; wins on payload size > 1 KB
Throughput — 10K–100K RPS per core typical for simple unary (vs 3K–15K for JSON REST)
Deadline propagation — context deadline cancels upstream work; REST needs custom timeout headers
Load balancing — L7 LB must understand gRPC (long-lived HTTP/2 connections); use xDS, linkerd, or gRPC-LB

⚖️ Trade-off

gRPC internal, REST external is the dominant pattern: browsers and third-party integrators get JSON/REST; service mesh traffic uses gRPC. GraphQL at the edge is an alternative when clients need flexible field selection.

Limitations

Browser support — requires gRPC-Web proxy (Envoy) for browsers; not native
Debugging — binary payloads need grpcurl or grpcui; harder than curl
CDN caching — POST-based RPCs don't cache at edge; REST GET does
Sticky connections — HTTP/2 connection reuse complicates L4 round-robin; need L7 or client-side LB
Schema rigidity — proto changes require coordinated rollout; JSON is more forgiving (dangerously so)

🏆 Senior Signal

"We'll use gRPC between services" is L4. "Order service exposes unary GetOrder and server-streaming ListOrders; Envoy sidecar handles mTLS and retries; protobuf schema versioned with buf breaking-change detection; public API remains REST with OpenAPI" is L6.

WebSockets

WebSockets upgrade an HTTP connection to a full-duplex, persistent channel—both client and server can push frames anytime. Essential for chat, gaming, collaborative docs, and live dashboards where server-initiated updates dominate.

Handshake and framing

Client sends HTTP upgrade request (Upgrade: websocket, Connection: Upgrade, Sec-WebSocket-Key). Server responds 101 Switching Protocols. After upgrade, communication uses lightweight binary/text frames—not HTTP.

sequenceDiagram
  participant C as Client
  participant S as Server
  C->>S: GET /ws HTTP/1.1 Upgrade: websocket
  S->>C: 101 Switching Protocols
  C->>S: WebSocket frames (bidirectional)
  S->>C: Push events anytime

Full-duplex semantics

Low overhead per message — 2–14 byte frame header vs full HTTP request per push
No request correlation — need application-level message IDs or channels
Back-pressure — TCP buffers can grow if consumer is slow; implement application flow control
Stateful connections — each socket tied to a server process; scaling is hard

Scaling WebSockets — the hard problem

Unlike stateless HTTP, a WebSocket connection lives on one server. When User A (on Server 1) messages User B (on Server 2), you need cross-server message routing.

Sticky sessions (session affinity)

L7 load balancer routes the same client IP/cookie to the same backend. Simple but fragile: server restart drops all connections; uneven load if some users are "heavy chatters."

Redis Pub/Sub (or Kafka) fan-out

Each server subscribes to channels (e.g. room:123). When a message arrives on Server 1, it publishes to Redis; all servers subscribed to that room receive it and push to their local connected clients.

// Server 1: user sends message
ws.on("message", (raw) => {
  const msg = JSON.parse(raw);
  redis.publish(`room:${msg.roomId}`, JSON.stringify(msg));
});

// All servers: relay to local sockets
redis.subscribe(`room:${roomId}`);
redis.on("message", (_, payload) => {
  localConnections.get(roomId)?.forEach((ws) => ws.send(payload));
});

Heartbeat and connection lifecycle

Ping/pong frames — RFC 6455 ping every 30–60 s detects dead peers
Idle timeouts — LBs (AWS ALB default 60 s) and CDNs kill silent connections; send application heartbeats
Graceful shutdown — on deploy, stop accepting new WS, send close frame, drain existing (see load-balancing chapter)
Reconnection — client exponential backoff + resume token to catch missed messages from buffer/Kafka

⚠️ Pitfall

Redis Pub/Sub is fire-and-forget—no persistence. If no subscriber is connected when message publishes, it's lost. For chat history, persist to DB/Kafka first, then fan-out. Discord uses Cassandra + Redis + custom routing.

📦 Real World

Slack migrated from HTTP long-polling to WebSockets with regional edge gateways. Figma uses CRDTs over WebSockets for collaborative editing—ordering and conflict resolution at application layer.

Server-Sent Events (SSE)

SSE is a one-way server→client stream over plain HTTP. Simpler than WebSockets when you only need push notifications, live feeds, or progress updates—and it works through most proxies and HTTP/2 infrastructure.

Protocol mechanics

Client opens GET /events with Accept: text/event-stream. Server keeps connection open and sends text frames delimited by double newlines. Fields: data:, event:, id:, retry:. Browser EventSource API auto-reconnects with last id.

HTTP/1.1 200 OK
Content-Type: text/event-stream
Cache-Control: no-cache
Connection: keep-alive

id: 42
event: price-update
data: {"symbol":"AAPL","price":189.50}

id: 43
data: {"symbol":"GOOG","price":141.20}

Use cases where SSE wins

Stock tickers, sports scores — server pushes, client displays
LLM token streaming — ChatGPT-style text generation (one direction)
Build/deploy logs — CI pipeline output streaming to browser
Notification feeds — simpler than WS when no client→server stream needed

HTTP/2 connection limits

Browsers limit concurrent connections per host (~6 for HTTP/1.1, ~100 streams for HTTP/2 but practical limits lower). Each SSE connection holds one stream. Opening SSE for 10 tabs × 5 streams = resource exhaustion.

Multiplex SSE over one connection — single EventSource, demux by event type in client
HTTP/2 server push deprecated — don't rely on it; SSE remains the pattern
HTTP/3 — independent streams reduce HOL blocking for mixed API + SSE traffic

⚖️ Trade-off

SSE is unidirectional and text-only (UTF-8). WebSockets support binary (Protobuf, audio frames). Choose SSE when simplicity, auto-reconnect, and HTTP compatibility matter; WebSockets when you need binary or client push.

💡 Pro Tip

SSE works through HTTP/2 and most corporate proxies (it's just a long GET). WebSocket upgrades are sometimes blocked. For AI streaming APIs, SSE + fetch for the POST prompt is the 2024+ default pattern.

Long Polling vs Short Polling vs WebSocket vs SSE

"How do we get real-time updates?" is a classic interview question. The answer depends on directionality, latency requirements, infrastructure constraints, and scale—not a default of "use WebSockets."

Short polling

Client requests every N seconds: GET /messages?since=timestamp. Simple, stateless, works everywhere. Wasteful: empty responses burn QPS; worst-case latency = poll interval.

Long polling

Server holds request open until data arrives or timeout (30–60 s), then client immediately reconnects. Reduces empty responses vs short polling. Still one request per event batch; HTTP overhead per push.

Criterion	Short polling	Long polling	SSE	WebSocket
Direction	Client pull	Client pull (simulated push)	Server → client	Bidirectional
Latency	0 – poll interval	~instant after event	~instant	~instant
Connection overhead	High (repeated handshakes)	Medium	Low (one long connection)	Lowest per message
Server state	Stateless	Pending request map	Open stream per client	Stateful socket
Scaling difficulty	Easy (stateless)	Medium	Medium	Hard (sticky + pub/sub)
Proxy/LB friendly	Excellent	Good (watch timeouts)	Good	Moderate (upgrade required)
Binary data	Via HTTP body	Via HTTP body	No (text only)	Yes
Auto reconnect	N/A	Manual	Built-in (EventSource)	Manual
Best for	Low-frequency updates	Legacy compat, moderate realtime	Feeds, LLM streams, notifications	Chat, games, collaboration

🎯 Interview Tip

For "design a notification system," start with requirements: "Do clients need to send data back over the same channel?" If no → SSE. If yes (typing indicators) → WebSocket. If updates are every 30 s → short polling is fine. Always quantify: "10M concurrent SSE connections × 4 KB buffer = 40 GB memory just for socket buffers."

API Design

Protocol choice is half the story. Production APIs need versioning that doesn't break clients, pagination that scales, rate limits clients can respect, idempotency for safe retries, and gateway patterns that centralize cross-cutting concerns.

Versioning strategies

Strategy	Example	Pros	Cons
URL path	`/v1/orders`	Explicit, easy to route at gateway	URL proliferation; resource duplication
Header	`Accept-Version: 2024-01-15`	Clean URLs; Stripe-style date versions	Harder to test in browser; cache complexity
Query param	`?api-version=2`	Simple	Easy to forget; not RESTful purist
Content type	`application/vnd.myapi.v2+json`	Hypermedia-friendly	Verbose; poor tooling support

🏆 Senior Signal

Versioning policy: "We add fields, never remove. Breaking changes get a new major version with 12-month sunset. Deprecation headers (Sunset, Deprecation) on old endpoints." This is what interviewers want—not just "/v1".

Pagination — offset vs cursor vs keyset

Offset pagination

GET /users?offset=100&limit=20 — skip N rows. Simple but O(offset) on DB (deep pages scan millions of rows). Inconsistent under concurrent inserts/deletes (duplicates or gaps between page fetches).

Cursor pagination

GET /users?cursor=eyJpZCI6MTIzfQ&limit=20 — opaque token encoding last-seen position. Stable under inserts if cursor is tied to sort key. Twitter/GraphQL Relay connections use this pattern.

Keyset (seek) pagination

GET /users?after_id=123&limit=20 with ORDER BY id — DB uses index seek: WHERE id > 123 LIMIT 20. O(1) per page regardless of depth. Requires sort column in index.

Method	Deep page cost	Consistency	Jump to page N
Offset	O(offset) — bad	Poor under writes	Yes
Cursor	O(limit) — good	Good	No
Keyset	O(limit) — good	Good (with tie-breaker column)	No

Rate limit headers

Clients need machine-readable quota state to backoff gracefully. Common patterns:

X-RateLimit-Limit: 1000 — quota per window
X-RateLimit-Remaining: 742 — requests left
X-RateLimit-Reset: 1717772400 — Unix timestamp when window resets
Retry-After: 30 — seconds to wait on 429
IETF draft — RateLimit-Limit, RateLimit-Remaining, RateLimit-Reset (standardizing)

Idempotency keys

Network retries cause duplicate side effects (double charge, double order). Client sends Idempotency-Key: uuid-v4 on POST. Server stores key → response mapping (Redis, 24 h TTL). Duplicate key returns cached response without re-executing.

POST /v1/payments HTTP/1.1
Idempotency-Key: 7c9e6679-7425-40de-944b-e07fc1f90ae7
Content-Type: application/json

{"amount": 4999, "currency": "usd", "customer": "cus_123"}

⚠️ Pitfall

Idempotency keys must be scoped per endpoint + tenant. Same key on different endpoints should not collide. Store the response, not just "seen"—client retry mid-flight needs the final result or 409 Conflict.

API Gateway

Single entry point for external clients. Responsibilities: TLS termination, auth (JWT/OAuth validation), rate limiting, request routing, response aggregation, protocol translation (REST → gRPC), WAF, analytics.

Kong, AWS API Gateway, Apigee, Envoy Gateway — productized gateways
BFF (Backend for Frontend) — one gateway per client type (mobile vs web) tailoring payloads
Don't put business logic in gateway—auth, routing, cross-cutting only; logic belongs in services

GraphQL — N+1 problem and DataLoader

GraphQL lets clients request nested graphs: { user { friends { name } } }. Naive resolvers issue 1 query for user + N queries for each friend's data = N+1 problem.

DataLoader batches and caches within a single request: collect all friend IDs during field resolution, one WHERE id IN (...) query, map results back. Per-request cache prevents duplicate loads.

const friendLoader = new DataLoader(async (ids) => {
  const rows = await db.query("SELECT * FROM users WHERE id = ANY($1)", [ids]);
  const map = new Map(rows.map((r) => [r.id, r]));
  return ids.map((id) => map.get(id));
});

// Resolver — DataLoader batches concurrent calls in same tick
friends: (user) => friendLoader.loadMany(user.friendIds)

⚖️ Trade-off

GraphQL — flexible clients, one round-trip; complex caching (no HTTP cache per URL), query cost attacks, harder versioning. REST — simple caching, CDN-friendly; over/under-fetching. Use GraphQL when many client types need different field sets (Facebook, GitHub); REST for public APIs and microservice internals.

📦 Real World

GitHub exposes REST v3 and GraphQL v4—GraphQL for flexible integrations, REST for simple CRUD. Netflix Falcor (predecessor to GraphQL ideas) solved N+1 with batching. Shopify rate-limits GraphQL by query cost, not just request count.