Caching Strategies

Caching is the highest-leverage optimization in system design—turning millisecond database round-trips into microsecond memory lookups. But caches introduce consistency, invalidation, and stampede problems that kill naive implementations at scale. This chapter covers placement, strategies, eviction, invalidation, and CDN edge caching with production numbers you can cite in interviews.

L4 L5 learning Redis 7.x · CloudFront · Memcached

Why cache

Every layer of the stack has a different latency budget. Caching moves hot data closer to the consumer— trading memory for speed and reduced load on expensive backends.

The latency gap

A single PostgreSQL query over a network costs 1–10 ms. L1 cache access is ~1 ns. Even in-memory Redis over TCP is ~0.1–1 ms—orders of magnitude faster than disk-backed databases. At 35K read QPS, serving from cache instead of DB can mean the difference between 35 database connections and 3,500.

Layer Typical latency Throughput Use when
L1 / L2 CPU cache 1–40 ns Billions ops/s Hot in-process variables, computed fields
Application heap (Caffeine, Guava) ~100 ns – 1 µs Millions ops/s per JVM Per-node hot keys, config, session fragments
Redis / Memcached ~0.1–1 ms ~100K ops/sec per node Shared cache across app fleet, rate limits, sessions
CDN edge ~10–50 ms (vs 200+ ms origin) Millions RPS globally Static assets, cacheable API responses, media
PostgreSQL (indexed) 1–10 ms ~5K–20K reads/s per node Source of truth, complex queries, writes
📐 Estimation

Redis single node: ~100K ops/sec (simple GET/SET). At 35K read QPS with 80% cache hit rate, effective DB load drops to 7K QPS. One Redis primary + replica handles this with headroom; plan 2× for failover.

What caching buys you

  • Lower latency — p99 drops when hot paths skip the database
  • Higher throughput — memory bandwidth exceeds disk IOPS by orders of magnitude
  • Cost reduction — fewer DB replicas, smaller instance sizes, less cross-AZ traffic
  • Resilience — stale cache can serve traffic during DB degradation (with explicit trade-offs)
⚖️ Trade-off

Caches are not free. You pay in consistency complexity, invalidation logic, memory cost, and operational surface (Redis failover, hot keys, monitoring). Cache only what you can afford to be slightly stale—or where staleness is acceptable by product requirements.

🎯 Interview Tip

When asked "how would you scale reads?", say: "Cache-aside Redis with 5-min TTL, ~100K ops/sec per node. At 35K QPS and 80% hit rate, DB sees 7K QPS. Mutex on cache miss to prevent stampede." Quantify every claim.

Cache placement

Where you put the cache determines hit rate, invalidation complexity, and consistency guarantees. Most production systems use multiple layers simultaneously.

flowchart TB
  Client[Client / Browser]
  CDN[CDN Edge]
  RP[Reverse Proxy / API Gateway]
  App[Application Cache\nCaffeine / Guava]
  Dist[Distributed Cache\nRedis Cluster]
  DB[(Database)]

  Client --> CDN
  CDN --> RP
  RP --> App
  App --> Dist
  Dist --> DB
  App -.->|L1 miss| Dist
  Dist -.->|L2 miss| DB
Layer Examples Pros Cons
Client Browser HTTP cache, Service Worker, mobile disk cache Zero server load; instant for repeat visits No server control after response sent; hard to invalidate
CDN CloudFront, Fastly, Akamai Geographic proximity; absorbs global traffic spikes Eventual consistency; purge latency; cost at scale
Reverse proxy Nginx proxy_cache, Varnish, Envoy Transparent to app; shields origin from identical requests Cache key design critical; per-PoP not shared across regions
Application Caffeine, Guava Cache, in-process dict Fastest access (~µs); no network hop Not shared across instances; invalidation per node
Distributed cache Redis, Memcached, Hazelcast Shared across fleet; sub-ms latency; rich data structures Network hop; hot key risk; memory cost; ops overhead
Database MySQL buffer pool, Postgres shared_buffers, query cache (deprecated) Automatic; no app code changes Least control; evicted under memory pressure; query-specific
📦 Real World

Facebook uses a multi-tier cache: CDN for static assets, Memcached for social graph hot data, and MySQL as source of truth. Netflix caches metadata in EVCache (memcached wrapper) with regional clusters; video bytes live on Open Connect CDN appliances.

🏆 Senior Signal

L6 candidates describe which layer owns invalidation for each data type. "User profile: Redis L2 + 60s app L1, invalidated via Kafka CDC on write. Product images: CDN with 1-year immutable cache key (content hash in URL)."

Caching strategies

Five patterns govern how reads and writes interact with the cache. Your choice determines consistency, write latency, and failure behavior.

Cache-aside (lazy loading)

Application manages the cache. On read: check cache → on miss, read DB → populate cache → return. On write: update DB → delete or update cache entry. Most common pattern in production.

sequenceDiagram
  participant App
  participant Cache
  participant DB
  App->>Cache: GET key
  alt cache hit
    Cache-->>App: value
  else cache miss
    Cache-->>App: null
    App->>DB: SELECT
    DB-->>App: row
    App->>Cache: SET key, TTL
    App-->>App: return value
  end
Strategy Read path Write path Pros Cons
Cache-aside App reads cache first; loads DB on miss App writes DB; invalidates cache Simple; cache only hot data; resilient to cache failure Stale data window; stampede on miss; app owns logic
Read-through Cache loads from DB on miss (cache library handles) App writes DB; cache unaware or invalidated Centralized load logic; cleaner app code Cache provider must support; same stampede risk
Write-through Standard cache read Write goes to cache + DB synchronously Cache always consistent on write completion Higher write latency; writes cold data into cache
Write-behind Standard cache read Write to cache; async flush to DB Fast writes; absorbs write bursts Data loss risk on crash; complexity; ordering issues
Refresh-ahead Proactively refresh before TTL expires Typically paired with cache-aside Reduces miss latency for predictable hot keys Wasted refreshes; needs access pattern prediction
// Cache-aside pattern (pseudo-code)
public User getUser(long id) {
    String key = "user:" + id;
    User cached = redis.get(key);
    if (cached != null) return cached;

    User user = db.findById(id);
    if (user != null) {
        redis.setex(key, 300, user); // 5-min TTL
    }
    return user;
}

public void updateUser(User user) {
    db.update(user);
    redis.del("user:" + user.getId()); // invalidate, not update
}
⚠️ Pitfall

Update cache on write (instead of invalidate) causes race conditions: two concurrent writes can leave cache with older data if write B completes before write A but A's cache SET runs last. Prefer delete on write for cache-aside unless you have versioning.

💡 Pro Tip

Default to cache-aside + TTL + delete-on-write for 90% of interview problems. Mention write-through only for low-write, high-read data (product catalog). Write-behind for analytics counters or fire-and-forget telemetry.

Eviction policies

Finite memory means something must leave when the cache is full. Eviction policy determines which keys survive—and whether your hit rate holds under pressure.

Policy Evicts Best for Weakness
LRU (Least Recently Used) Key not accessed longest General-purpose; temporal locality (sessions, feeds) One-time scan floods out hot data; no frequency awareness
LFU (Least Frequently Used) Key with lowest access count Stable hot set (top products, config) New keys evicted before warming up; counter aging needed
FIFO (First In, First Out) Oldest inserted key Simple streaming buffers; predictable order Ignores access patterns entirely; poor hit rate usually
TTL (Time To Live) Key past expiration timestamp Time-sensitive data; automatic staleness bound Not a capacity policy alone—combine with LRU
ARC (Adaptive Replacement Cache) Balances recency + frequency adaptively Mixed workloads; DB buffer pools (ZFS, PostgreSQL) More memory overhead for metadata; complex to implement

Redis eviction

Redis uses maxmemory + maxmemory-policy. Production default: allkeys-lru or allkeys-lfu (Redis 4.0+) for cache-only deployments. Use volatile-lru when mixing cache keys (with TTL) and persistent keys (no TTL).

🔬 Under the Hood

Redis LRU is approximate—it samples 5 random keys and evicts the oldest among them. This avoids O(n) full scan on every insert. LFU uses logarithmic counter with decay so old popularity fades.

Cache invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. Invalidation strategy is where most caching bugs originate.

Invalidation strategies

Strategy Mechanism Consistency Complexity
TTL expiration Key auto-expires after N seconds Eventual; bounded staleness = TTL Low—set and forget
Event-driven (CDC) Debezium/Kafka streams DB changes → invalidates cache Near-real-time; seconds of lag Medium—pipeline to maintain
Versioning / ETags Cache key includes version: user:42:v7 Strong on read if version checked Medium—version source of truth needed
Pub/Sub broadcast Redis PUBLISH on write; all app nodes evict local L1 Seconds; depends on delivery Medium—missed messages = stale L1
flowchart LR
  App[App Server]
  DB[(PostgreSQL)]
  CDC[Debezium / CDC]
  Kafka[Kafka Topic]
  Inv[Invalidation Worker]
  Redis[(Redis)]

  App -->|UPDATE| DB
  DB --> CDC
  CDC --> Kafka
  Kafka --> Inv
  Inv -->|DEL user:42| Redis

Thundering herd problem

When a popular cache key expires (or is invalidated), thousands of concurrent requests miss simultaneously and hammer the database—the "thundering herd" or "dog-piling" effect. A key with 10K RPS and sudden invalidation can send 10K simultaneous DB queries.

⚠️ Pitfall

TTL-only invalidation on hot keys is dangerous. A viral post's cache expiring at the same instant can DDoS your own database. Always pair TTL with stampede protection for keys above ~1K RPS.

Cache stampede solutions

Three proven techniques prevent coordinated cache misses from overwhelming your origin. Production systems often combine all three.

Technique How it works Pros Cons
Mutex / single-flight Only one request rebuilds on miss; others wait or retry cache Simple; guarantees ≤1 DB query per miss event Wait latency for losers; lock timeout tuning; Redis SETNX pitfalls
Stale-while-revalidate (SWR) Serve stale value immediately; async background refresh Zero user-facing latency spike; smooth traffic Users see stale data during refresh; needs soft + hard TTL
Probabilistic early expiration (XFetch) Each request has decreasing probability to refresh before hard TTL No locks; spreads refresh load over time Math tuning (beta factor); occasional early stale serve
# Mutex / single-flight on cache miss (Redis)
def get_with_mutex(key, loader_fn, ttl=300):
    val = redis.get(key)
    if val:
        return val

    lock_key = f"lock:{key}"
    if redis.set(lock_key, "1", nx=True, ex=10):  # 10s lock
        try:
            val = loader_fn()
            redis.setex(key, ttl, val)
            return val
        finally:
            redis.delete(lock_key)
    else:
        time.sleep(0.05)  # brief wait
        return redis.get(key) or loader_fn()  # fallback
💡 Pro Tip

Cloudflare and Fastly implement SWR natively via stale-while-revalidate Cache-Control directive. For Redis, Facebook's memcached uses "lease get"—same mutex concept with automatic stale serve.

🎯 Interview Tip

When interviewer asks "what if cache expires during peak?", answer with layered defense: (1) mutex so one thread rebuilds, (2) SWR so users never wait, (3) jittered TTL so keys don't expire together. Mention TTL + random(0, 60s) to prevent synchronized expiry.

Distributed cache

A single Redis node handles ~100K ops/sec. Beyond that you shard, replicate, and layer caches to protect both the cache tier and the database.

Redis Cluster

Redis Cluster shards data across 16,384 hash slots using CRC16 mod 16384. Each master owns a slot range; replicas provide failover. Client libraries (Lettuce, Jedis) handle MOVED/ASK redirects. Minimum production topology: 3 masters + 3 replicas across AZs.

flowchart TB
  subgraph nodes [App Fleet]
    A1[App 1\nCaffeine L1]
    A2[App 2\nCaffeine L1]
    A3[App N\nCaffeine L1]
  end
  subgraph redis [Redis Cluster]
    M1[Master 1\nslots 0-5460]
    M2[Master 2\nslots 5461-10922]
    M3[Master 3\nslots 10923-16383]
  end
  DB[(PostgreSQL)]

  A1 & A2 & A3 -->|L2 miss ~1ms| redis
  redis -->|miss ~5ms| DB

L1 / L2 / DB hierarchy

  • L1 (in-process) — Caffeine with 10–60s TTL; absorbs 90%+ of repeated reads on same node
  • L2 (Redis) — Shared across fleet; 1–5 min TTL; source for L1 population
  • DB — Source of truth; only hit on L2 miss (~1–5% of reads with good hit rates)

Cache warming

Cold start after deploy or failover causes mass cache miss. Warm caches by: pre-loading hot keys from DB on startup, running shadow traffic before cutover, or maintaining a "warm key list" refreshed by a background job. Netflix pre-warms EVCache from Cassandra on region bootstrap.

Hot key problem

A single key (celebrity tweet, flash sale SKU) can exceed one Redis node's ~100K ops/sec limit. Solutions: local replication (app holds in-memory copy), key splitting (product:42:shard-{0..7} with random read), read replicas for the hot slot, or multilayer CDN for read-heavy public data.

Hot key symptom Detection Mitigation
Single Redis CPU pegged at 100% redis-cli --hotkeys, slowlog, client-side key metrics Local L1 cache, key sharding, dedicated read replica
Uneven slot distribution Cluster node memory/ops imbalance Hash tag redesign; move to request coalescing
Network saturation on one node Bandwidth metrics per Redis instance CDN offload; compress values; edge caching
📦 Real World

Twitter ran into hot key issues during World Cup tweets—a single timeline key exceeded single-node Memcached capacity. Solution: application-side key splitting + local in-process cache. Redis documents hot key patterns in their latency troubleshooting guide.

CDN caching

CDNs cache at the edge—closest to users geographically. Essential for static assets, video, and cacheable API responses at global scale.

Cache-Control headers

Directive Meaning Example use
max-age=31536000, immutable Cache 1 year; content never changes at this URL JS/CSS with content hash in filename
max-age=300, s-maxage=600 Browser 5 min; CDN 10 min Semi-dynamic API responses
no-store Never cache anywhere PII, auth tokens, personalized data
stale-while-revalidate=60 Serve stale up to 60s while refreshing Stampede protection at edge
Vary: Accept-Encoding Separate cache entries per encoding gzip vs brotli responses

CDN invalidation

Purge by URL, path prefix, or cache tag. CloudFront invalidation takes 10–60 seconds and costs per path. Prefer versioned URLs (/assets/app.v42.js) over purge for deploys. Use cache tags (Fastly Surrogate-Key) for granular group invalidation without listing every URL.

Edge functions

CloudFront Functions, Lambda@Edge, Fastly Compute@Edge run logic at PoPs—A/B routing, auth token validation, header rewriting, bot filtering, and personalized cache key construction. Latency: <1 ms for CloudFront Functions, ~10–50 ms for Lambda@Edge (includes cold start risk).

sequenceDiagram
  participant User
  participant Edge as CDN Edge PoP
  participant Origin as Origin Server

  User->>Edge: GET /api/products
  alt cache hit
    Edge-->>User: 200 cached response
  else cache miss
    Edge->>Origin: GET /api/products
    Origin-->>Edge: 200 + Cache-Control
    Edge->>Edge: store with TTL
    Edge-->>User: 200 response
  end
⚖️ Trade-off

CDN caching API responses saves origin load but complicates personalization and auth. Cache only anonymous, shared responses—or use edge functions to vary cache key by cookie/API key segment. Never CDN-cache authenticated user-specific data without careful Vary design.

🏆 Senior Signal

Full-stack caching answer: "Static assets on CDN with immutable hash URLs. API reads through Redis cache-aside (80% hit, 5-min TTL, mutex on miss). User-specific data: no CDN, optional 30s app L1. Deploy invalidation via versioned asset URLs, not purge API."