Skip to content
Backend Engineer0%
System Design Interview0%

Caching

Caching is the most impactful performance optimization in all of computing. Every layer of the modern stack — from CPU registers to CDN edge nodes — is a cache. Understanding caching deeply means understanding trade-offs between speed, freshness, memory, and complexity. Get it right and your system handles 100x the load. Get it wrong and you serve stale data, crash under thundering herds, or burn money on memory you don't need.

This section doesn't just describe caching patterns. It takes you from the physics of memory hierarchies through the mathematics of hit-rate modeling to the engineering decisions that separate a cache that saves your system from one that destroys your data integrity.

Why Caching Exists

Every cache exists because of a single physical reality: accessing data from a closer, faster storage medium is cheaper than recomputing or re-fetching it from the source of truth. This is a consequence of the memory hierarchy, which itself is a consequence of physics — faster storage is more expensive per bit, so we have less of it.

Storage TierLatencyCapacityCost per GB
CPU L1 Cache~1 ns64 KBBuilt into CPU
CPU L3 Cache~10 ns32 MBBuilt into CPU
RAM~100 ns64-512 GB~$5
NVMe SSD~100 μs1-4 TB~$0.10
Network (same DC)~500 μsUnlimitedVariable
Network (cross-region)~50 msUnlimitedVariable
Disk (HDD)~10 ms8-20 TB~$0.02

The ratio between accessing RAM and crossing a network is 1,000x to 500,000x. That gap is what makes caching worthwhile — and what makes cache misses so expensive.

Concept Map

Learning Path

Follow this order for the most coherent understanding:

OrderTopicWhy This Order
1Caching StrategiesThe foundational read/write patterns — everything else builds on these
2Cache InvalidationThe hardest problem — understand why stale data is inevitable and how to manage it
3Thundering HerdWhat happens when caching goes wrong at scale, and how to prevent it
4Cache WarmingSolving the cold-start problem — critical for deployments and failover
5Multi-Layer CachingCombining L1/L2/L3 caches for maximum performance and resilience
6Cache Sizing MathThe mathematics behind "how big should my cache be?"
7Redis Caching PatternsProduction Redis patterns — eviction, pipelining, Lua scripts, data structures
8CDN Deep DiveEdge caching at global scale — headers, purging, cache key design

The Two Hard Problems

Phil Karlton famously said:

"There are only two hard things in Computer Science: cache invalidation and naming things."

He was right. Caching is easy to add and extraordinarily hard to get right. The failure modes are subtle — stale data served for hours, thundering herds that take down origin servers, inconsistency between cache layers, memory bloat from unbounded caches, and cache stampedes during deploys.

Every page in this section addresses one or more of these failure modes head-on, with the mathematics to prove the solutions work and the war stories to show what happens when they don't.

The Fundamental Trade-Off

Every caching decision is a point on this triangle:

        Freshness
           /\
          /  \
         /    \
        /      \
       /________\
   Speed      Memory
  • Freshness vs Speed: Shorter TTLs mean fresher data but more cache misses and higher origin load.
  • Freshness vs Memory: Storing more versions for consistency costs memory.
  • Speed vs Memory: Caching more data means faster responses but larger memory footprint and higher cost.

There is no free lunch. Every pattern in this section is a specific trade-off along these axes, and the right choice depends on your access patterns, consistency requirements, and budget.

Key Metrics

Before diving into patterns, understand the metrics that define cache effectiveness:

  • Hit Rate — Percentage of requests served from cache. Target: 85-99% depending on use case.
  • Miss Rate — 1 - Hit Rate. Each miss means a full round-trip to origin.
  • Eviction Rate — How often entries are evicted before expiry. High eviction rate means the cache is too small.
  • Latency at p50/p99 — Cache hits should be sub-millisecond (in-process) or single-digit ms (distributed).
  • Origin Load — Traffic that reaches the source of truth. Caching should reduce this by 10-100x.
  • Stale Serve Rate — Percentage of responses served from expired cache entries. Acceptable in some systems, fatal in others.
  • Memory Utilization — How efficiently cache memory is used. Overprovisioning wastes money; underprovisioning causes thrashing.

Prerequisites

This section assumes familiarity with:

  • Basic data structures (hash maps, linked lists)
  • Network fundamentals (HTTP, TCP/IP)
  • Database basics (reads, writes, indexes)
  • Asynchronous programming in TypeScript/JavaScript

No prior caching knowledge is required — we build from first principles.

"What I cannot create, I do not understand." — Richard Feynman