Caching Strategies
Caching is the highest-leverage optimization in system design—turning millisecond database round-trips into microsecond memory lookups. But caches introduce consistency, invalidation, and stampede problems that kill naive implementations at scale. This chapter covers placement, strategies, eviction, invalidation, and CDN edge caching with production numbers you can cite in interviews.
Why cache
Every layer of the stack has a different latency budget. Caching moves hot data closer to the consumer— trading memory for speed and reduced load on expensive backends.
The latency gap
A single PostgreSQL query over a network costs 1–10 ms. L1 cache access is ~1 ns. Even in-memory Redis over TCP is ~0.1–1 ms—orders of magnitude faster than disk-backed databases. At 35K read QPS, serving from cache instead of DB can mean the difference between 35 database connections and 3,500.
| Layer | Typical latency | Throughput | Use when |
|---|---|---|---|
| L1 / L2 CPU cache | 1–40 ns | Billions ops/s | Hot in-process variables, computed fields |
| Application heap (Caffeine, Guava) | ~100 ns – 1 µs | Millions ops/s per JVM | Per-node hot keys, config, session fragments |
| Redis / Memcached | ~0.1–1 ms | ~100K ops/sec per node | Shared cache across app fleet, rate limits, sessions |
| CDN edge | ~10–50 ms (vs 200+ ms origin) | Millions RPS globally | Static assets, cacheable API responses, media |
| PostgreSQL (indexed) | 1–10 ms | ~5K–20K reads/s per node | Source of truth, complex queries, writes |
Redis single node: ~100K ops/sec (simple GET/SET). At 35K read QPS with 80% cache hit rate, effective DB load drops to 7K QPS. One Redis primary + replica handles this with headroom; plan 2× for failover.
What caching buys you
- Lower latency — p99 drops when hot paths skip the database
- Higher throughput — memory bandwidth exceeds disk IOPS by orders of magnitude
- Cost reduction — fewer DB replicas, smaller instance sizes, less cross-AZ traffic
- Resilience — stale cache can serve traffic during DB degradation (with explicit trade-offs)
Caches are not free. You pay in consistency complexity, invalidation logic, memory cost, and operational surface (Redis failover, hot keys, monitoring). Cache only what you can afford to be slightly stale—or where staleness is acceptable by product requirements.
When asked "how would you scale reads?", say: "Cache-aside Redis with 5-min TTL, ~100K ops/sec per node. At 35K QPS and 80% hit rate, DB sees 7K QPS. Mutex on cache miss to prevent stampede." Quantify every claim.
Cache placement
Where you put the cache determines hit rate, invalidation complexity, and consistency guarantees. Most production systems use multiple layers simultaneously.
flowchart TB Client[Client / Browser] CDN[CDN Edge] RP[Reverse Proxy / API Gateway] App[Application Cache\nCaffeine / Guava] Dist[Distributed Cache\nRedis Cluster] DB[(Database)] Client --> CDN CDN --> RP RP --> App App --> Dist Dist --> DB App -.->|L1 miss| Dist Dist -.->|L2 miss| DB
| Layer | Examples | Pros | Cons |
|---|---|---|---|
| Client | Browser HTTP cache, Service Worker, mobile disk cache | Zero server load; instant for repeat visits | No server control after response sent; hard to invalidate |
| CDN | CloudFront, Fastly, Akamai | Geographic proximity; absorbs global traffic spikes | Eventual consistency; purge latency; cost at scale |
| Reverse proxy | Nginx proxy_cache, Varnish, Envoy | Transparent to app; shields origin from identical requests | Cache key design critical; per-PoP not shared across regions |
| Application | Caffeine, Guava Cache, in-process dict | Fastest access (~µs); no network hop | Not shared across instances; invalidation per node |
| Distributed cache | Redis, Memcached, Hazelcast | Shared across fleet; sub-ms latency; rich data structures | Network hop; hot key risk; memory cost; ops overhead |
| Database | MySQL buffer pool, Postgres shared_buffers, query cache (deprecated) | Automatic; no app code changes | Least control; evicted under memory pressure; query-specific |
Facebook uses a multi-tier cache: CDN for static assets, Memcached for social graph hot data, and MySQL as source of truth. Netflix caches metadata in EVCache (memcached wrapper) with regional clusters; video bytes live on Open Connect CDN appliances.
L6 candidates describe which layer owns invalidation for each data type. "User profile: Redis L2 + 60s app L1, invalidated via Kafka CDC on write. Product images: CDN with 1-year immutable cache key (content hash in URL)."
Caching strategies
Five patterns govern how reads and writes interact with the cache. Your choice determines consistency, write latency, and failure behavior.
Cache-aside (lazy loading)
Application manages the cache. On read: check cache → on miss, read DB → populate cache → return. On write: update DB → delete or update cache entry. Most common pattern in production.
sequenceDiagram
participant App
participant Cache
participant DB
App->>Cache: GET key
alt cache hit
Cache-->>App: value
else cache miss
Cache-->>App: null
App->>DB: SELECT
DB-->>App: row
App->>Cache: SET key, TTL
App-->>App: return value
end
| Strategy | Read path | Write path | Pros | Cons |
|---|---|---|---|---|
| Cache-aside | App reads cache first; loads DB on miss | App writes DB; invalidates cache | Simple; cache only hot data; resilient to cache failure | Stale data window; stampede on miss; app owns logic |
| Read-through | Cache loads from DB on miss (cache library handles) | App writes DB; cache unaware or invalidated | Centralized load logic; cleaner app code | Cache provider must support; same stampede risk |
| Write-through | Standard cache read | Write goes to cache + DB synchronously | Cache always consistent on write completion | Higher write latency; writes cold data into cache |
| Write-behind | Standard cache read | Write to cache; async flush to DB | Fast writes; absorbs write bursts | Data loss risk on crash; complexity; ordering issues |
| Refresh-ahead | Proactively refresh before TTL expires | Typically paired with cache-aside | Reduces miss latency for predictable hot keys | Wasted refreshes; needs access pattern prediction |
// Cache-aside pattern (pseudo-code)
public User getUser(long id) {
String key = "user:" + id;
User cached = redis.get(key);
if (cached != null) return cached;
User user = db.findById(id);
if (user != null) {
redis.setex(key, 300, user); // 5-min TTL
}
return user;
}
public void updateUser(User user) {
db.update(user);
redis.del("user:" + user.getId()); // invalidate, not update
}
Update cache on write (instead of invalidate) causes race conditions: two concurrent writes can leave cache with older data if write B completes before write A but A's cache SET runs last. Prefer delete on write for cache-aside unless you have versioning.
Default to cache-aside + TTL + delete-on-write for 90% of interview problems. Mention write-through only for low-write, high-read data (product catalog). Write-behind for analytics counters or fire-and-forget telemetry.
Eviction policies
Finite memory means something must leave when the cache is full. Eviction policy determines which keys survive—and whether your hit rate holds under pressure.
| Policy | Evicts | Best for | Weakness |
|---|---|---|---|
| LRU (Least Recently Used) | Key not accessed longest | General-purpose; temporal locality (sessions, feeds) | One-time scan floods out hot data; no frequency awareness |
| LFU (Least Frequently Used) | Key with lowest access count | Stable hot set (top products, config) | New keys evicted before warming up; counter aging needed |
| FIFO (First In, First Out) | Oldest inserted key | Simple streaming buffers; predictable order | Ignores access patterns entirely; poor hit rate usually |
| TTL (Time To Live) | Key past expiration timestamp | Time-sensitive data; automatic staleness bound | Not a capacity policy alone—combine with LRU |
| ARC (Adaptive Replacement Cache) | Balances recency + frequency adaptively | Mixed workloads; DB buffer pools (ZFS, PostgreSQL) | More memory overhead for metadata; complex to implement |
Redis eviction
Redis uses maxmemory + maxmemory-policy. Production default:
allkeys-lru or allkeys-lfu (Redis 4.0+) for cache-only deployments.
Use volatile-lru when mixing cache keys (with TTL) and persistent keys (no TTL).
Redis LRU is approximate—it samples 5 random keys and evicts the oldest among them. This avoids O(n) full scan on every insert. LFU uses logarithmic counter with decay so old popularity fades.
Cache invalidation
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. Invalidation strategy is where most caching bugs originate.
Invalidation strategies
| Strategy | Mechanism | Consistency | Complexity |
|---|---|---|---|
| TTL expiration | Key auto-expires after N seconds | Eventual; bounded staleness = TTL | Low—set and forget |
| Event-driven (CDC) | Debezium/Kafka streams DB changes → invalidates cache | Near-real-time; seconds of lag | Medium—pipeline to maintain |
| Versioning / ETags | Cache key includes version: user:42:v7 |
Strong on read if version checked | Medium—version source of truth needed |
| Pub/Sub broadcast | Redis PUBLISH on write; all app nodes evict local L1 | Seconds; depends on delivery | Medium—missed messages = stale L1 |
flowchart LR App[App Server] DB[(PostgreSQL)] CDC[Debezium / CDC] Kafka[Kafka Topic] Inv[Invalidation Worker] Redis[(Redis)] App -->|UPDATE| DB DB --> CDC CDC --> Kafka Kafka --> Inv Inv -->|DEL user:42| Redis
Thundering herd problem
When a popular cache key expires (or is invalidated), thousands of concurrent requests miss simultaneously and hammer the database—the "thundering herd" or "dog-piling" effect. A key with 10K RPS and sudden invalidation can send 10K simultaneous DB queries.
TTL-only invalidation on hot keys is dangerous. A viral post's cache expiring at the same instant can DDoS your own database. Always pair TTL with stampede protection for keys above ~1K RPS.
Cache stampede solutions
Three proven techniques prevent coordinated cache misses from overwhelming your origin. Production systems often combine all three.
| Technique | How it works | Pros | Cons |
|---|---|---|---|
| Mutex / single-flight | Only one request rebuilds on miss; others wait or retry cache | Simple; guarantees ≤1 DB query per miss event | Wait latency for losers; lock timeout tuning; Redis SETNX pitfalls |
| Stale-while-revalidate (SWR) | Serve stale value immediately; async background refresh | Zero user-facing latency spike; smooth traffic | Users see stale data during refresh; needs soft + hard TTL |
| Probabilistic early expiration (XFetch) | Each request has decreasing probability to refresh before hard TTL | No locks; spreads refresh load over time | Math tuning (beta factor); occasional early stale serve |
# Mutex / single-flight on cache miss (Redis)
def get_with_mutex(key, loader_fn, ttl=300):
val = redis.get(key)
if val:
return val
lock_key = f"lock:{key}"
if redis.set(lock_key, "1", nx=True, ex=10): # 10s lock
try:
val = loader_fn()
redis.setex(key, ttl, val)
return val
finally:
redis.delete(lock_key)
else:
time.sleep(0.05) # brief wait
return redis.get(key) or loader_fn() # fallback
Cloudflare and Fastly implement SWR natively via
stale-while-revalidate Cache-Control directive. For Redis, Facebook's memcached uses
"lease get"—same mutex concept with automatic stale serve.
When interviewer asks "what if cache expires during peak?", answer with layered defense:
(1) mutex so one thread rebuilds, (2) SWR so users never wait, (3) jittered TTL so keys don't expire together.
Mention TTL + random(0, 60s) to prevent synchronized expiry.
Distributed cache
A single Redis node handles ~100K ops/sec. Beyond that you shard, replicate, and layer caches to protect both the cache tier and the database.
Redis Cluster
Redis Cluster shards data across 16,384 hash slots using CRC16 mod 16384. Each master owns a slot range; replicas provide failover. Client libraries (Lettuce, Jedis) handle MOVED/ASK redirects. Minimum production topology: 3 masters + 3 replicas across AZs.
flowchart TB
subgraph nodes [App Fleet]
A1[App 1\nCaffeine L1]
A2[App 2\nCaffeine L1]
A3[App N\nCaffeine L1]
end
subgraph redis [Redis Cluster]
M1[Master 1\nslots 0-5460]
M2[Master 2\nslots 5461-10922]
M3[Master 3\nslots 10923-16383]
end
DB[(PostgreSQL)]
A1 & A2 & A3 -->|L2 miss ~1ms| redis
redis -->|miss ~5ms| DB
L1 / L2 / DB hierarchy
- L1 (in-process) — Caffeine with 10–60s TTL; absorbs 90%+ of repeated reads on same node
- L2 (Redis) — Shared across fleet; 1–5 min TTL; source for L1 population
- DB — Source of truth; only hit on L2 miss (~1–5% of reads with good hit rates)
Cache warming
Cold start after deploy or failover causes mass cache miss. Warm caches by: pre-loading hot keys from DB on startup, running shadow traffic before cutover, or maintaining a "warm key list" refreshed by a background job. Netflix pre-warms EVCache from Cassandra on region bootstrap.
Hot key problem
A single key (celebrity tweet, flash sale SKU) can exceed one Redis node's ~100K ops/sec limit.
Solutions: local replication (app holds in-memory copy), key splitting
(product:42:shard-{0..7} with random read), read replicas for the hot slot,
or multilayer CDN for read-heavy public data.
| Hot key symptom | Detection | Mitigation |
|---|---|---|
| Single Redis CPU pegged at 100% | redis-cli --hotkeys, slowlog, client-side key metrics |
Local L1 cache, key sharding, dedicated read replica |
| Uneven slot distribution | Cluster node memory/ops imbalance | Hash tag redesign; move to request coalescing |
| Network saturation on one node | Bandwidth metrics per Redis instance | CDN offload; compress values; edge caching |
Twitter ran into hot key issues during World Cup tweets—a single timeline key exceeded single-node Memcached capacity. Solution: application-side key splitting + local in-process cache. Redis documents hot key patterns in their latency troubleshooting guide.
CDN caching
CDNs cache at the edge—closest to users geographically. Essential for static assets, video, and cacheable API responses at global scale.
Cache-Control headers
| Directive | Meaning | Example use |
|---|---|---|
max-age=31536000, immutable |
Cache 1 year; content never changes at this URL | JS/CSS with content hash in filename |
max-age=300, s-maxage=600 |
Browser 5 min; CDN 10 min | Semi-dynamic API responses |
no-store |
Never cache anywhere | PII, auth tokens, personalized data |
stale-while-revalidate=60 |
Serve stale up to 60s while refreshing | Stampede protection at edge |
Vary: Accept-Encoding |
Separate cache entries per encoding | gzip vs brotli responses |
CDN invalidation
Purge by URL, path prefix, or cache tag. CloudFront invalidation takes 10–60 seconds and costs per path.
Prefer versioned URLs (/assets/app.v42.js) over purge for deploys.
Use cache tags (Fastly Surrogate-Key) for granular group invalidation without listing every URL.
Edge functions
CloudFront Functions, Lambda@Edge, Fastly Compute@Edge run logic at PoPs—A/B routing, auth token validation, header rewriting, bot filtering, and personalized cache key construction. Latency: <1 ms for CloudFront Functions, ~10–50 ms for Lambda@Edge (includes cold start risk).
sequenceDiagram
participant User
participant Edge as CDN Edge PoP
participant Origin as Origin Server
User->>Edge: GET /api/products
alt cache hit
Edge-->>User: 200 cached response
else cache miss
Edge->>Origin: GET /api/products
Origin-->>Edge: 200 + Cache-Control
Edge->>Edge: store with TTL
Edge-->>User: 200 response
end
CDN caching API responses saves origin load but complicates personalization and auth. Cache only anonymous, shared responses—or use edge functions to vary cache key by cookie/API key segment. Never CDN-cache authenticated user-specific data without careful Vary design.
Full-stack caching answer: "Static assets on CDN with immutable hash URLs. API reads through Redis cache-aside (80% hit, 5-min TTL, mutex on miss). User-specific data: no CDN, optional 30s app L1. Deploy invalidation via versioned asset URLs, not purge API."