Skip to Content

Caching

Caching is storing frequently accessed data in a faster storage layer to reduce latency, database load, and improve throughput. It’s one of the most effective techniques for scaling systems.

Why Caching?

Without Cache: With Cache: ┌────────┐ ┌────────┐ ┌────────┐ ┌───────┐ ┌────────┐ │ Client │────▶│ DB │ │ Client │────▶│ Cache │────▶│ DB │ └────────┘ └────────┘ └────────┘ └───────┘ └────────┘ Every request hits DB Cache hit: ~1ms Latency: 10-100ms Cache miss: falls through to DB
BenefitDescription
Reduced LatencyMemory access (~1ms) vs disk/network (~10-100ms)
Lower DB LoadFewer queries hit the database
Higher ThroughputServe more requests with same resources
Cost SavingsReduce expensive database scaling

Caching Strategies

1. Cache-Aside (Lazy Loading)

Application manages the cache explicitly. Most common pattern.

Read Flow: ┌────────┐ 1. Check ┌───────┐ │ App │───────────▶│ Cache │ └────────┘ └───────┘ │ │ │ 3. Return 2. Miss ▼ │ ┌────────┐ 4. Fetch ┌───────┐ │ App │◀───────────│ DB │ └────────┘ └───────┘ │ 5. Populate cache ┌───────┐ │ Cache │ └───────┘
def get_user(user_id): # 1. Check cache user = cache.get(f"user:{user_id}") if user: return user # Cache hit # 2. Cache miss - fetch from DB user = db.query("SELECT * FROM users WHERE id = ?", user_id) # 3. Populate cache cache.set(f"user:{user_id}", user, ttl=3600) return user
ProsCons
Only requested data is cachedCache miss penalty (extra round trip)
Resilient to cache failuresData can become stale
Simple to implementApplication manages cache logic

Best for: Read-heavy workloads, data that’s expensive to compute


2. Read-Through

Cache sits between application and database. Cache itself fetches data on miss.

┌────────┐ ┌───────┐ ┌────────┐ │ App │────────▶│ Cache │────────▶│ DB │ └────────┘ └───────┘ └────────┘ Automatically fetches on cache miss
ProsCons
Simpler application codeCache library must support DB access
Consistent read patternLess flexibility

Best for: When using cache providers with built-in read-through support


3. Write-Through

Every write goes to both cache and database synchronously.

Write Flow: ┌────────┐ 1. Write ┌───────┐ 2. Write ┌────────┐ │ App │───────────▶│ Cache │───────────▶│ DB │ └────────┘ └───────┘ └────────┘ Both updated synchronously
def update_user(user_id, data): # Write to DB first db.update("UPDATE users SET ... WHERE id = ?", data, user_id) # Then update cache cache.set(f"user:{user_id}", data, ttl=3600)
ProsCons
Cache always consistent with DBHigher write latency (2 writes)
No stale dataEvery write hits both stores
Simple mental modelWrites to data never read waste cache space

Best for: Data where consistency is critical, read-after-write scenarios


4. Write-Behind (Write-Back)

Writes go to cache immediately, then asynchronously persisted to database.

Write Flow: ┌────────┐ 1. Write ┌───────┐ │ App │───────────▶│ Cache │ (returns immediately) └────────┘ └───────┘ 2. Async write (batched/delayed) ┌────────┐ │ DB │ └────────┘
ProsCons
Lowest write latencyRisk of data loss if cache fails
Batch writes to DBComplex to implement correctly
Absorbs write spikesEventual consistency

Best for: High write throughput, analytics, metrics, non-critical data


5. Write-Around

Writes go directly to DB, bypassing cache. Cache populated only on reads.

Write: App ──────────────────────▶ DB Read: App ───▶ Cache (miss) ───▶ DB Populate on read
ProsCons
Avoids cache pollution from writesHigher read latency after writes
Good for write-once dataCache may have stale data

Best for: Data written once but rarely read, log data


Strategy Comparison

StrategyRead LatencyWrite LatencyConsistencyComplexity
Cache-AsideLow (hit) / High (miss)N/AEventualLow
Read-ThroughLow (hit) / Medium (miss)N/AEventualMedium
Write-ThroughLowHighStrongMedium
Write-BehindLowVery LowEventualHigh
Write-AroundHigh (after write)LowEventualLow

Cache Eviction Policies

When cache is full, which items to remove?

LRU (Least Recently Used)

Evicts items that haven’t been accessed for the longest time.

Access sequence: A, B, C, A, D (cache size: 3) Step 1: [A] - A added Step 2: [A, B] - B added Step 3: [A, B, C] - C added (full) Step 4: [B, C, A] - A accessed, moves to end Step 5: [C, A, D] - D added, B evicted (least recent)

Best for: General-purpose, most common choice

LFU (Least Frequently Used)

Evicts items with lowest access count.

Item frequencies: A(5), B(2), C(8), D(1) Eviction order: D, B, A, C

Best for: When access patterns are stable, popular items stay popular

FIFO (First In, First Out)

Evicts oldest items regardless of access pattern.

Best for: Simple use cases, time-based data

TTL (Time To Live)

Items expire after a set duration.

cache.set("session:123", data, ttl=3600) # Expires in 1 hour

Best for: Session data, tokens, time-sensitive data

Comparison

PolicyProsConsUse Case
LRUAdapts to access patternsDoesn’t consider frequencyGeneral purpose
LFUKeeps popular itemsSlow to adapt to changesStable access patterns
FIFOSimple, predictableIgnores access patternsTime-series data
TTLGuarantees freshnessMay evict useful dataSessions, tokens

Cache Invalidation

“There are only two hard things in Computer Science: cache invalidation and naming things.” — Phil Karlton

Invalidation Strategies

1. TTL-Based Expiration

cache.set("user:123", user_data, ttl=300) # 5 minutes
ProsCons
Simple to implementData stale until expiry
No coordination neededHard to choose right TTL

2. Event-Driven Invalidation

Invalidate cache when source data changes.

def update_user(user_id, data): db.update(user_id, data) cache.delete(f"user:{user_id}") # Invalidate # Or publish event event_bus.publish("user.updated", {"user_id": user_id})
┌────────┐ Update ┌────────┐ Event ┌───────────┐ Invalidate ┌───────┐ │ Service│─────────▶│ DB │────────▶│Event Queue│─────────────▶│ Cache │ └────────┘ └────────┘ └───────────┘ └───────┘

3. Version-Based Invalidation

Include version in cache key.

version = db.get_version("users") cache_key = f"user:{user_id}:v{version}"

4. Tag-Based Invalidation

Group related cache entries with tags.

cache.set("product:123", data, tags=["products", "category:electronics"]) cache.invalidate_by_tag("category:electronics") # Clears all tagged items

Invalidation Patterns by Consistency Need

Consistency LevelStrategyLatency
StrongWrite-through + invalidateHigh
Eventual (seconds)Event-driven invalidationMedium
Eventual (minutes)Short TTLLow
Best-effortLong TTLLowest

Distributed Caching

Redis

In-memory data structure store with persistence, replication, and clustering.

┌─────────────────────────────────────────────────────────┐ │ Redis Cluster │ ├─────────────┬─────────────┬─────────────┬──────────────┤ │ Master 1 │ Master 2 │ Master 3 │ Master 4 │ │ Slots 0-4095│Slots 4096-8191│Slots 8192-12287│Slots 12288-16383│ ├─────────────┼─────────────┼─────────────┼──────────────┤ │ Replica 1 │ Replica 2 │ Replica 3 │ Replica 4 │ └─────────────┴─────────────┴─────────────┴──────────────┘

Key Features:

  • Data structures: Strings, Lists, Sets, Sorted Sets, Hashes, Streams
  • Persistence: RDB snapshots, AOF logging
  • Pub/Sub messaging
  • Lua scripting
  • Cluster mode with automatic sharding

Common Use Cases:

Use CaseData StructureExample
Session storageHashHSET session:123 user_id 456
Rate limitingString + INCRINCR api:user:123:minute
LeaderboardsSorted SetZADD leaderboard 100 "player1"
Real-time analyticsHyperLogLogPFADD visitors "user123"
Message queuesList/StreamLPUSH queue:tasks "job1"
GeospatialGeoGEOADD locations 77.5 12.9 "store1"

Memcached

Simple, high-performance, distributed memory caching.

┌─────────┐ ┌─────────────────────────────────┐ │ Client │────▶│ Memcached Pool │ └─────────┘ ├─────────┬─────────┬─────────────┤ │ Node 1 │ Node 2 │ Node 3 │ │ 25% │ 25% │ 50% │ └─────────┴─────────┴─────────────┘ Consistent hashing

Key Features:

  • Simple key-value only
  • Multi-threaded (uses all CPU cores)
  • No persistence (pure cache)
  • Consistent hashing for distribution

Redis vs Memcached

FeatureRedisMemcached
Data StructuresRich (lists, sets, etc.)Key-value only
PersistenceYes (RDB, AOF)No
ReplicationYesNo (client-side)
ClusteringBuilt-inClient-side
Memory EfficiencyLowerHigher
Multi-threadedSingle-threaded*Multi-threaded
Pub/SubYesNo
Lua ScriptingYesNo

*Redis 6+ has I/O threading

Choose Redis when: You need data structures, persistence, or pub/sub

Choose Memcached when: Simple caching, maximum memory efficiency


CDN Caching

Content Delivery Network caches content at edge locations globally.

┌─────────────┐ │ Origin │ │ Server │ └──────┬──────┘ ┌────────────────────┼────────────────────┐ │ │ │ ┌─────┴─────┐ ┌─────┴─────┐ ┌─────┴─────┐ │ Edge PoP │ │ Edge PoP │ │ Edge PoP │ │ (US-West)│ │ (Europe) │ │ (Asia) │ └─────┬─────┘ └─────┬─────┘ └─────┬─────┘ │ │ │ ┌─────┴─────┐ ┌─────┴─────┐ ┌─────┴─────┐ │ Users │ │ Users │ │ Users │ └───────────┘ └───────────┘ └───────────┘

Cache-Control Headers

Cache-Control: max-age=3600, public Cache-Control: private, no-cache Cache-Control: no-store Cache-Control: stale-while-revalidate=60
DirectiveMeaning
max-age=NCache for N seconds
publicCan be cached by CDN/proxies
privateOnly browser can cache
no-cacheMust revalidate before using
no-storeDon’t cache at all
stale-while-revalidateServe stale while fetching fresh

What to Cache on CDN

Content TypeCache StrategyTTL
Static assets (JS, CSS, images)Aggressive, versioned URLs1 year
HTML pagesShort TTL or no-cache0-5 min
API responsesVaries by endpoint1-60 min
User-specific contentprivate or no-store-

CDN Invalidation

# Invalidate specific paths aws cloudfront create-invalidation --paths "/images/*" "/css/*" # Versioned URLs (better approach) /static/app.v2.3.1.js # Change version, no invalidation needed

Multi-Level Caching (L1/L2)

┌────────────────────────────────────────────────────────────────┐ │ Request Flow │ ├────────────────────────────────────────────────────────────────┤ │ │ │ ┌──────────┐ ┌──────────────┐ ┌──────────┐ ┌──────┐ │ │ │ Browser │───▶│ CDN │───▶│ App Cache│───▶│ DB │ │ │ │ Cache │ │ (Edge) │ │ (Redis) │ │ │ │ │ └──────────┘ └──────────────┘ └──────────┘ └──────┘ │ │ L1 L2 L3 Origin │ │ (~0ms) (~10-50ms) (~1-5ms) (~10-100ms)│ │ │ └────────────────────────────────────────────────────────────────┘

In-Process vs Distributed Cache

┌─────────────────────────────────────────────────────────────┐ │ Application Server │ │ ┌─────────────────┐ │ │ │ L1: In-Process │ (Caffeine, Guava) │ │ │ - Fastest │ - No network hop │ │ │ - Limited size │ - Per-instance, not shared │ │ └────────┬────────┘ │ │ │ Miss │ │ ▼ │ │ ┌─────────────────┐ │ │ │ L2: Distributed │ (Redis, Memcached) │ │ │ - Shared │ - Network hop (~1ms) │ │ │ - Large │ - Consistent across instances │ │ └────────┬────────┘ │ │ │ Miss │ │ ▼ │ │ ┌───────┐ │ │ │ DB │ │ │ └───────┘ │ └─────────────────────────────────────────────────────────────┘
// L1: In-process cache (Caffeine) Cache<String, User> localCache = Caffeine.newBuilder() .maximumSize(10_000) .expireAfterWrite(5, TimeUnit.MINUTES) .build(); // L2: Distributed cache (Redis) public User getUser(String userId) { // Check L1 User user = localCache.getIfPresent(userId); if (user != null) return user; // Check L2 user = redisTemplate.opsForValue().get("user:" + userId); if (user != null) { localCache.put(userId, user); // Populate L1 return user; } // Fetch from DB user = userRepository.findById(userId); redisTemplate.opsForValue().set("user:" + userId, user, Duration.ofHours(1)); localCache.put(userId, user); return user; }

Common Caching Problems

1. Cache Stampede (Thundering Herd)

Many requests hit DB simultaneously when a popular cache key expires.

Cache expires ──▶ 1000 requests ──▶ All hit DB ──▶ DB overload

Solutions:

# 1. Locking (only one fetches from DB) def get_with_lock(key): value = cache.get(key) if value: return value lock = cache.acquire_lock(f"lock:{key}", timeout=5) if lock: try: value = db.fetch(key) cache.set(key, value, ttl=300) finally: lock.release() else: # Wait and retry time.sleep(0.1) return get_with_lock(key) return value # 2. Probabilistic early expiration def get_with_early_refresh(key, ttl=300, beta=1): value, expiry = cache.get_with_expiry(key) # Probabilistically refresh before expiry if value and random.random() < beta * math.exp(-time_to_expiry): refresh_async(key) return value # 3. Stale-while-revalidate cache.set(key, value, ttl=300, stale_ttl=60) # Serve stale for 60s while refreshing

2. Cache Penetration

Requests for non-existent data bypass cache and hit DB every time.

Request for user:999999 ──▶ Cache miss ──▶ DB (not found) ──▶ Repeat

Solutions:

# 1. Cache null/empty results def get_user(user_id): cached = cache.get(f"user:{user_id}") if cached == "NULL": return None if cached: return cached user = db.get_user(user_id) if user is None: cache.set(f"user:{user_id}", "NULL", ttl=60) # Short TTL else: cache.set(f"user:{user_id}", user, ttl=3600) return user # 2. Bloom filter bloom_filter = BloomFilter(expected_items=1_000_000, fp_rate=0.01) def get_user(user_id): if not bloom_filter.might_contain(user_id): return None # Definitely doesn't exist # Proceed with cache/DB lookup return fetch_from_cache_or_db(user_id)

3. Cache Breakdown

A single hot key expires, causing spike in DB load.

Solutions:

  • Never expire hot keys (refresh before expiry)
  • Use locking for hot key refresh
  • Replicate hot keys across cache nodes

4. Data Inconsistency

Cache and database have different values.

Timeline: T1: Thread A reads user from DB (version 1) T2: Thread B updates user in DB (version 2) T3: Thread B updates cache (version 2) T4: Thread A writes to cache (version 1) ← Stale data!

Solutions:

# 1. Delete instead of update def update_user(user_id, data): db.update(user_id, data) cache.delete(f"user:{user_id}") # Next read will fetch fresh # 2. Use versioning def update_user(user_id, data, expected_version): new_version = db.update_with_version(user_id, data, expected_version) cache.set(f"user:{user_id}:v{new_version}", data)

Cache Sizing and Capacity Planning

Hit Rate Calculation

Hit Rate = Cache Hits / (Cache Hits + Cache Misses) Example: - 900 hits + 100 misses = 90% hit rate - Target: 95%+ for production systems

Memory Sizing

Memory = (Avg Item Size) × (Number of Items) × (Overhead Factor) Example: - 1M users × 2KB per user × 1.3 overhead = ~2.6 GB

Key Design Guidelines

# Good: Predictable, hierarchical "user:123:profile" "user:123:settings" "product:456:details" "session:abc123" # Bad: Unpredictable, too long "getUserByIdFromDatabaseWithAllRelations:123"
GuidelineRecommendation
Key lengthKeep short (bandwidth cost)
NamingUse colons as delimiters
VersioningInclude version for schema changes
NamespacingPrefix by service/domain

Interview Quick Reference

Common Questions

  1. “Design a cache for a social media feed”

    • Cache-aside with TTL
    • Invalidate on new posts
    • Consider write-behind for view counts
  2. “How would you handle cache invalidation across services?”

    • Event-driven invalidation via message queue
    • Pub/sub for real-time updates
    • TTL as fallback
  3. “Your cache hit rate dropped from 95% to 60%. How do you debug?”

    • Check for key pattern changes
    • Analyze eviction rate (memory pressure?)
    • Look for cache stampede after deploys
    • Verify TTL settings

Numbers to Know

MetricValue
Redis single-node throughput100K+ ops/sec
Redis latency (local)~0.1ms
Redis latency (network)1-5ms
Memcached throughput200K+ ops/sec
CDN latency (edge hit)10-50ms
Typical target hit rate95%+

Design Checklist

  • What data to cache? (Hot data, expensive queries)
  • Caching strategy? (Cache-aside, write-through, etc.)
  • Eviction policy? (LRU, LFU, TTL)
  • Invalidation approach? (TTL, event-driven, versioned)
  • Cache size and memory budget?
  • Distributed or local? (Redis vs in-process)
  • Failure mode? (What if cache is down?)
  • Cold start strategy? (Pre-warming?)
  • Monitoring? (Hit rate, latency, memory)

Quick Decision Matrix

ScenarioRecommended Approach
Read-heavy, tolerance for stalenessCache-aside + TTL
Strong consistency neededWrite-through
High write throughputWrite-behind
Global distributionCDN + regional caches
Session storageRedis with persistence
Simple key-value, max performanceMemcached
Complex data structuresRedis
Last updated on