Caching Strategies

Why cache

Every layer of the stack has a different latency budget. Caching moves hot data closer to the consumer— trading memory for speed and reduced load on expensive backends.

The latency gap

A single PostgreSQL query over a network costs 1–10 ms. L1 cache access is ~1 ns. Even in-memory Redis over TCP is ~0.1–1 ms—orders of magnitude faster than disk-backed databases. At 35K read QPS, serving from cache instead of DB can mean the difference between 35 database connections and 3,500.

Layer	Typical latency	Throughput	Use when
L1 / L2 CPU cache	1–40 ns	Billions ops/s	Hot in-process variables, computed fields
Application heap (Caffeine, Guava)	~100 ns – 1 µs	Millions ops/s per JVM	Per-node hot keys, config, session fragments
Redis / Memcached	~0.1–1 ms	~100K ops/sec per node	Shared cache across app fleet, rate limits, sessions
CDN edge	~10–50 ms (vs 200+ ms origin)	Millions RPS globally	Static assets, cacheable API responses, media
PostgreSQL (indexed)	1–10 ms	~5K–20K reads/s per node	Source of truth, complex queries, writes

📐 Estimation

Redis single node: ~100K ops/sec (simple GET/SET). At 35K read QPS with 80% cache hit rate, effective DB load drops to 7K QPS. One Redis primary + replica handles this with headroom; plan 2× for failover.

What caching buys you

Lower latency — p99 drops when hot paths skip the database
Higher throughput — memory bandwidth exceeds disk IOPS by orders of magnitude
Cost reduction — fewer DB replicas, smaller instance sizes, less cross-AZ traffic
Resilience — stale cache can serve traffic during DB degradation (with explicit trade-offs)

⚖️ Trade-off

Caches are not free. You pay in consistency complexity, invalidation logic, memory cost, and operational surface (Redis failover, hot keys, monitoring). Cache only what you can afford to be slightly stale—or where staleness is acceptable by product requirements.

🎯 Interview Tip

When asked "how would you scale reads?", say: "Cache-aside Redis with 5-min TTL, ~100K ops/sec per node. At 35K QPS and 80% hit rate, DB sees 7K QPS. Mutex on cache miss to prevent stampede." Quantify every claim.

Cache placement

Where you put the cache determines hit rate, invalidation complexity, and consistency guarantees. Most production systems use multiple layers simultaneously.

flowchart TB
  Client[Client / Browser]
  CDN[CDN Edge]
  RP[Reverse Proxy / API Gateway]
  App[Application Cache\nCaffeine / Guava]
  Dist[Distributed Cache\nRedis Cluster]
  DB[(Database)]

  Client --> CDN
  CDN --> RP
  RP --> App
  App --> Dist
  Dist --> DB
  App -.->|L1 miss| Dist
  Dist -.->|L2 miss| DB

Layer	Examples	Pros	Cons
Client	Browser HTTP cache, Service Worker, mobile disk cache	Zero server load; instant for repeat visits	No server control after response sent; hard to invalidate
CDN	CloudFront, Fastly, Akamai	Geographic proximity; absorbs global traffic spikes	Eventual consistency; purge latency; cost at scale
Reverse proxy	Nginx proxy_cache, Varnish, Envoy	Transparent to app; shields origin from identical requests	Cache key design critical; per-PoP not shared across regions
Application	Caffeine, Guava Cache, in-process dict	Fastest access (~µs); no network hop	Not shared across instances; invalidation per node
Distributed cache	Redis, Memcached, Hazelcast	Shared across fleet; sub-ms latency; rich data structures	Network hop; hot key risk; memory cost; ops overhead
Database	MySQL buffer pool, Postgres shared_buffers, query cache (deprecated)	Automatic; no app code changes	Least control; evicted under memory pressure; query-specific

📦 Real World

Facebook uses a multi-tier cache: CDN for static assets, Memcached for social graph hot data, and MySQL as source of truth. Netflix caches metadata in EVCache (memcached wrapper) with regional clusters; video bytes live on Open Connect CDN appliances.

🏆 Senior Signal

L6 candidates describe which layer owns invalidation for each data type. "User profile: Redis L2 + 60s app L1, invalidated via Kafka CDC on write. Product images: CDN with 1-year immutable cache key (content hash in URL)."

Caching strategies

Five patterns govern how reads and writes interact with the cache. Your choice determines consistency, write latency, and failure behavior.

Cache-aside (lazy loading)

Application manages the cache. On read: check cache → on miss, read DB → populate cache → return. On write: update DB → delete or update cache entry. Most common pattern in production.

sequenceDiagram
  participant App
  participant Cache
  participant DB
  App->>Cache: GET key
  alt cache hit
    Cache-->>App: value
  else cache miss
    Cache-->>App: null
    App->>DB: SELECT
    DB-->>App: row
    App->>Cache: SET key, TTL
    App-->>App: return value
  end

Strategy	Read path	Write path	Pros	Cons
Cache-aside	App reads cache first; loads DB on miss	App writes DB; invalidates cache	Simple; cache only hot data; resilient to cache failure	Stale data window; stampede on miss; app owns logic
Read-through	Cache loads from DB on miss (cache library handles)	App writes DB; cache unaware or invalidated	Centralized load logic; cleaner app code	Cache provider must support; same stampede risk
Write-through	Standard cache read	Write goes to cache + DB synchronously	Cache always consistent on write completion	Higher write latency; writes cold data into cache
Write-behind	Standard cache read	Write to cache; async flush to DB	Fast writes; absorbs write bursts	Data loss risk on crash; complexity; ordering issues
Refresh-ahead	Proactively refresh before TTL expires	Typically paired with cache-aside	Reduces miss latency for predictable hot keys	Wasted refreshes; needs access pattern prediction

// Cache-aside pattern (pseudo-code)
public User getUser(long id) {
    String key = "user:" + id;
    User cached = redis.get(key);
    if (cached != null) return cached;

    User user = db.findById(id);
    if (user != null) {
        redis.setex(key, 300, user); // 5-min TTL
    }
    return user;
}

public void updateUser(User user) {
    db.update(user);
    redis.del("user:" + user.getId()); // invalidate, not update
}

⚠️ Pitfall

Update cache on write (instead of invalidate) causes race conditions: two concurrent writes can leave cache with older data if write B completes before write A but A's cache SET runs last. Prefer delete on write for cache-aside unless you have versioning.

💡 Pro Tip

Default to cache-aside + TTL + delete-on-write for 90% of interview problems. Mention write-through only for low-write, high-read data (product catalog). Write-behind for analytics counters or fire-and-forget telemetry.

Eviction policies

Finite memory means something must leave when the cache is full. Eviction policy determines which keys survive—and whether your hit rate holds under pressure.

Policy	Evicts	Best for	Weakness
LRU (Least Recently Used)	Key not accessed longest	General-purpose; temporal locality (sessions, feeds)	One-time scan floods out hot data; no frequency awareness
LFU (Least Frequently Used)	Key with lowest access count	Stable hot set (top products, config)	New keys evicted before warming up; counter aging needed
FIFO (First In, First Out)	Oldest inserted key	Simple streaming buffers; predictable order	Ignores access patterns entirely; poor hit rate usually
TTL (Time To Live)	Key past expiration timestamp	Time-sensitive data; automatic staleness bound	Not a capacity policy alone—combine with LRU
ARC (Adaptive Replacement Cache)	Balances recency + frequency adaptively	Mixed workloads; DB buffer pools (ZFS, PostgreSQL)	More memory overhead for metadata; complex to implement

Redis eviction

Redis uses maxmemory + maxmemory-policy. Production default: allkeys-lru or allkeys-lfu (Redis 4.0+) for cache-only deployments. Use volatile-lru when mixing cache keys (with TTL) and persistent keys (no TTL).

🔬 Under the Hood

Redis LRU is approximate—it samples 5 random keys and evicts the oldest among them. This avoids O(n) full scan on every insert. LFU uses logarithmic counter with decay so old popularity fades.

Cache invalidation

"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton. Invalidation strategy is where most caching bugs originate.

Invalidation strategies

Strategy	Mechanism	Consistency	Complexity
TTL expiration	Key auto-expires after N seconds	Eventual; bounded staleness = TTL	Low—set and forget
Event-driven (CDC)	Debezium/Kafka streams DB changes → invalidates cache	Near-real-time; seconds of lag	Medium—pipeline to maintain
Versioning / ETags	Cache key includes version: `user:42:v7`	Strong on read if version checked	Medium—version source of truth needed
Pub/Sub broadcast	Redis PUBLISH on write; all app nodes evict local L1	Seconds; depends on delivery	Medium—missed messages = stale L1

flowchart LR
  App[App Server]
  DB[(PostgreSQL)]
  CDC[Debezium / CDC]
  Kafka[Kafka Topic]
  Inv[Invalidation Worker]
  Redis[(Redis)]

  App -->|UPDATE| DB
  DB --> CDC
  CDC --> Kafka
  Kafka --> Inv
  Inv -->|DEL user:42| Redis

Thundering herd problem

When a popular cache key expires (or is invalidated), thousands of concurrent requests miss simultaneously and hammer the database—the "thundering herd" or "dog-piling" effect. A key with 10K RPS and sudden invalidation can send 10K simultaneous DB queries.

⚠️ Pitfall

TTL-only invalidation on hot keys is dangerous. A viral post's cache expiring at the same instant can DDoS your own database. Always pair TTL with stampede protection for keys above ~1K RPS.

Cache stampede solutions

Three proven techniques prevent coordinated cache misses from overwhelming your origin. Production systems often combine all three.

Technique	How it works	Pros	Cons
Mutex / single-flight	Only one request rebuilds on miss; others wait or retry cache	Simple; guarantees ≤1 DB query per miss event	Wait latency for losers; lock timeout tuning; Redis SETNX pitfalls
Stale-while-revalidate (SWR)	Serve stale value immediately; async background refresh	Zero user-facing latency spike; smooth traffic	Users see stale data during refresh; needs soft + hard TTL
Probabilistic early expiration (XFetch)	Each request has decreasing probability to refresh before hard TTL	No locks; spreads refresh load over time	Math tuning (beta factor); occasional early stale serve

# Mutex / single-flight on cache miss (Redis)
def get_with_mutex(key, loader_fn, ttl=300):
    val = redis.get(key)
    if val:
        return val

    lock_key = f"lock:{key}"
    if redis.set(lock_key, "1", nx=True, ex=10):  # 10s lock
        try:
            val = loader_fn()
            redis.setex(key, ttl, val)
            return val
        finally:
            redis.delete(lock_key)
    else:
        time.sleep(0.05)  # brief wait
        return redis.get(key) or loader_fn()  # fallback

💡 Pro Tip

Cloudflare and Fastly implement SWR natively via stale-while-revalidate Cache-Control directive. For Redis, Facebook's memcached uses "lease get"—same mutex concept with automatic stale serve.

🎯 Interview Tip

When interviewer asks "what if cache expires during peak?", answer with layered defense: (1) mutex so one thread rebuilds, (2) SWR so users never wait, (3) jittered TTL so keys don't expire together. Mention TTL + random(0, 60s) to prevent synchronized expiry.

Distributed cache

A single Redis node handles ~100K ops/sec. Beyond that you shard, replicate, and layer caches to protect both the cache tier and the database.

Redis Cluster

Redis Cluster shards data across 16,384 hash slots using CRC16 mod 16384. Each master owns a slot range; replicas provide failover. Client libraries (Lettuce, Jedis) handle MOVED/ASK redirects. Minimum production topology: 3 masters + 3 replicas across AZs.

flowchart TB
  subgraph nodes [App Fleet]
    A1[App 1\nCaffeine L1]
    A2[App 2\nCaffeine L1]
    A3[App N\nCaffeine L1]
  end
  subgraph redis [Redis Cluster]
    M1[Master 1\nslots 0-5460]
    M2[Master 2\nslots 5461-10922]
    M3[Master 3\nslots 10923-16383]
  end
  DB[(PostgreSQL)]

  A1 & A2 & A3 -->|L2 miss ~1ms| redis
  redis -->|miss ~5ms| DB

L1 / L2 / DB hierarchy

L1 (in-process) — Caffeine with 10–60s TTL; absorbs 90%+ of repeated reads on same node
L2 (Redis) — Shared across fleet; 1–5 min TTL; source for L1 population
DB — Source of truth; only hit on L2 miss (~1–5% of reads with good hit rates)

Cache warming

Cold start after deploy or failover causes mass cache miss. Warm caches by: pre-loading hot keys from DB on startup, running shadow traffic before cutover, or maintaining a "warm key list" refreshed by a background job. Netflix pre-warms EVCache from Cassandra on region bootstrap.

Hot key problem

A single key (celebrity tweet, flash sale SKU) can exceed one Redis node's ~100K ops/sec limit. Solutions: local replication (app holds in-memory copy), key splitting (product:42:shard-{0..7} with random read), read replicas for the hot slot, or multilayer CDN for read-heavy public data.

Hot key symptom	Detection	Mitigation
Single Redis CPU pegged at 100%	`redis-cli --hotkeys`, slowlog, client-side key metrics	Local L1 cache, key sharding, dedicated read replica
Uneven slot distribution	Cluster node memory/ops imbalance	Hash tag redesign; move to request coalescing
Network saturation on one node	Bandwidth metrics per Redis instance	CDN offload; compress values; edge caching

📦 Real World

Twitter ran into hot key issues during World Cup tweets—a single timeline key exceeded single-node Memcached capacity. Solution: application-side key splitting + local in-process cache. Redis documents hot key patterns in their latency troubleshooting guide.

CDN caching

CDNs cache at the edge—closest to users geographically. Essential for static assets, video, and cacheable API responses at global scale.

Cache-Control headers

Directive	Meaning	Example use
`max-age=31536000, immutable`	Cache 1 year; content never changes at this URL	JS/CSS with content hash in filename
`max-age=300, s-maxage=600`	Browser 5 min; CDN 10 min	Semi-dynamic API responses
`no-store`	Never cache anywhere	PII, auth tokens, personalized data
`stale-while-revalidate=60`	Serve stale up to 60s while refreshing	Stampede protection at edge
`Vary: Accept-Encoding`	Separate cache entries per encoding	gzip vs brotli responses

CDN invalidation

Purge by URL, path prefix, or cache tag. CloudFront invalidation takes 10–60 seconds and costs per path. Prefer versioned URLs (/assets/app.v42.js) over purge for deploys. Use cache tags (Fastly Surrogate-Key) for granular group invalidation without listing every URL.

Edge functions

CloudFront Functions, Lambda@Edge, Fastly Compute@Edge run logic at PoPs—A/B routing, auth token validation, header rewriting, bot filtering, and personalized cache key construction. Latency: <1 ms for CloudFront Functions, ~10–50 ms for Lambda@Edge (includes cold start risk).

sequenceDiagram
  participant User
  participant Edge as CDN Edge PoP
  participant Origin as Origin Server

  User->>Edge: GET /api/products
  alt cache hit
    Edge-->>User: 200 cached response
  else cache miss
    Edge->>Origin: GET /api/products
    Origin-->>Edge: 200 + Cache-Control
    Edge->>Edge: store with TTL
    Edge-->>User: 200 response
  end

⚖️ Trade-off

CDN caching API responses saves origin load but complicates personalization and auth. Cache only anonymous, shared responses—or use edge functions to vary cache key by cookie/API key segment. Never CDN-cache authenticated user-specific data without careful Vary design.

🏆 Senior Signal

Full-stack caching answer: "Static assets on CDN with immutable hash URLs. API reads through Redis cache-aside (80% hit, 5-min TTL, mutex on miss). User-specific data: no CDN, optional 30s app L1. Deploy invalidation via versioned asset URLs, not purge API."