Skip to content
Unverified — AI-generated content. Help verify this page

Cache Sizing Math

"How big should my cache be?" is one of the most important questions in system design, and one of the most poorly answered. Too small and the cache thrashes (entries evicted before they can be re-used), hit rate collapses, and origin load spikes. Too large and you waste money on memory, increase garbage collection pressure, and create a false sense of security. The right answer requires understanding your access pattern mathematically — specifically, the distribution of request frequencies across keys.

Why This Matters

Memory is not free. A Redis instance with 100 GB of RAM costs approximately $500-1500/month on AWS. If your working set is only 10 GB, you're wasting $450-1350/month. If your working set is 200 GB and you provision 100 GB, your hit rate drops from 95% to 60%, and your database takes 8x more load than expected.

Getting cache size right is a mathematical problem with a precise answer. This page gives you the tools to compute that answer.

First Principles

A cache is effective when the data it stores is the data most likely to be requested next. The key insight: not all data is equally popular. In virtually every real-world system, a small fraction of keys receives a large fraction of requests. This skew is what makes caching work.

If every key were equally popular (uniform distribution), a cache holding C of N total keys would have a hit rate of exactly C/N. You'd need to cache 90% of your data to get a 90% hit rate. That's not a cache — that's a database replica.

But real-world access patterns are skewed. The skew follows a power law, typically modeled by the Zipf distribution. This skew is why a cache holding just 1-10% of your data can achieve 80-99% hit rates.

The Zipf Distribution

Definition

In a Zipf distribution, the frequency of the i-th most popular key is proportional to:

f(i)=1iα

where α is the Zipf exponent (also called the skew parameter). The probability that a random request is for the i-th most popular key is:

P(i)=1/iαj=1N1/jα=1iαHN,α

where HN,α=j=1N1/jα is the generalized harmonic number.

The Skew Parameter α

The value of α determines how skewed the access pattern is:

αDistribution ShapeReal-World Example
0Uniform (no skew)Random lookups (unusual)
0.5Mild skewInternal service calls
0.8Moderate skewTypical web API traffic
1.0Classic Zipf (Zipf's law)Natural language word frequency
1.2Strong skewSocial media (celebrity accounts)
1.5Very strong skewViral content, CDN traffic
2.0Extreme skewBreaking news, flash sales

Most web applications have α between 0.7 and 1.2.

Measuring α From Your Data

To determine α for your system, collect access counts for each key over a representative time window, then fit a power law:

typescript
function estimateZipfAlpha(accessCounts: number[]): number {
  // Sort in descending order
  const sorted = [...accessCounts].sort((a, b) => b - a);
  const n = sorted.length;

  // Use maximum likelihood estimation
  // For Zipf: alpha_MLE = 1 / (mean of ln(rank))
  let sumLogRank = 0;
  for (let i = 0; i < n; i++) {
    const rank = i + 1;
    // Weight by access count
    sumLogRank += sorted[i] * Math.log(rank);
  }

  const totalAccesses = sorted.reduce((a, b) => a + b, 0);
  const meanLogRank = sumLogRank / totalAccesses;

  // MLE estimate (simplified — Hill estimator)
  return 1 / meanLogRank;
}

// More robust: least-squares regression on log-log plot
function estimateZipfAlphaRegression(accessCounts: number[]): number {
  const sorted = [...accessCounts].sort((a, b) => b - a);
  const n = sorted.length;

  // Log-log regression: log(count) = -alpha * log(rank) + const
  let sumX = 0, sumY = 0, sumXY = 0, sumXX = 0;

  for (let i = 0; i < n; i++) {
    if (sorted[i] === 0) continue;
    const x = Math.log(i + 1);       // log(rank)
    const y = Math.log(sorted[i]);    // log(count)
    sumX += x;
    sumY += y;
    sumXY += x * y;
    sumXX += x * x;
  }

  // Slope of regression line (negated, since slope is -alpha)
  const slope = (n * sumXY - sumX * sumY) / (n * sumXX - sumX * sumX);
  return -slope;
}

Hit Rate Modeling

The Independent Reference Model (IRM)

The IRM assumes that each request independently selects a key according to the same probability distribution. Under IRM with a Zipf distribution and an LRU cache of size C:

Hit Rate=i=1CP(i)i=1NP(i)=i=1C1/iαi=1N1/iα=HC,αHN,α

This is exact for IRM with optimal (Belady's) replacement, and a close approximation for LRU when C is not too small.

Computing the Generalized Harmonic Number

Hn,α=i=1n1iα

For large n and α1:

Hn,αn1α1α+ζ(α)12nα+α12nα+1

where ζ(α) is the Riemann zeta function.

For α=1 (classic Zipf):

Hn,1=i=1n1iln(n)+γ

where γ0.5772 is the Euler-Mascheroni constant.

TypeScript: Hit Rate Calculator

typescript
function generalizedHarmonic(n: number, alpha: number): number {
  // For small n, compute exactly
  if (n < 10000) {
    let sum = 0;
    for (let i = 1; i <= n; i++) {
      sum += 1 / Math.pow(i, alpha);
    }
    return sum;
  }

  // For large n with alpha = 1
  if (Math.abs(alpha - 1) < 0.001) {
    return Math.log(n) + 0.5772156649;
  }

  // For large n with alpha != 1 (approximation)
  return Math.pow(n, 1 - alpha) / (1 - alpha) + 0.5;
}

function estimateHitRate(
  cacheSize: number,   // C: number of entries the cache can hold
  totalKeys: number,   // N: total unique keys in the dataset
  alpha: number        // Zipf exponent
): number {
  const hC = generalizedHarmonic(cacheSize, alpha);
  const hN = generalizedHarmonic(totalKeys, alpha);
  return hC / hN;
}

// Example: 10,000 cache entries, 1,000,000 total keys, alpha = 1.0
console.log(estimateHitRate(10_000, 1_000_000, 1.0));
// ≈ 0.640 (64% hit rate)

console.log(estimateHitRate(10_000, 1_000_000, 1.2));
// ≈ 0.857 (85.7% hit rate)

console.log(estimateHitRate(100_000, 1_000_000, 1.0));
// ≈ 0.837 (83.7% hit rate)

console.log(estimateHitRate(100_000, 1_000_000, 1.2));
// ≈ 0.968 (96.8% hit rate)

Hit Rate Tables

For N=1,000,000 total keys:

Cache Size (C)α=0.8α=1.0α=1.2α=1.5
1,000 (0.1%)19.2%41.0%64.2%86.7%
10,000 (1%)31.2%64.0%85.7%97.0%
50,000 (5%)44.7%78.8%93.9%99.2%
100,000 (10%)53.1%83.7%96.0%99.6%
200,000 (20%)63.1%88.7%97.8%99.9%
500,000 (50%)78.6%95.0%99.3%~100%

Key takeaway: With α1.0, caching just 1% of your data gives you a 64%+ hit rate. Caching 10% gives 84%+ hit rate. The more skewed your access pattern, the smaller the cache you need.


The 80/20 Rule Formalized

The Pareto principle — "80% of effects come from 20% of causes" — can be derived from the Zipf distribution. If α=1.0:

The fraction of requests served by the top p fraction of keys is:

F(p)=HpN,αHN,αln(pN)ln(N)=1+ln(p)ln(N)

For N=1,000,000 and p=0.20 (top 20% of keys):

F(0.20)1+ln(0.20)ln(1,000,000)=1+1.60913.816=10.116=0.884

So the top 20% of keys serve 88.4% of requests — close to the 80/20 rule.

For α=0.86 (the exact value that gives 80/20):

F(0.20)=0.80 when α0.86

The 80/20 rule corresponds to α0.86. Most web traffic has α between 0.8 and 1.2, so the 80/20 rule is roughly correct (often 90/10 or even 95/5 in practice).


Required Cache Size for Target Hit Rate

Given a target hit rate h, we need to find the cache size C such that:

HC,αHN,α=h

Solving for C:

HC,α=hHN,α

For α=1 (using the logarithmic approximation):

ln(C)+γh(ln(N)+γ)Ceh(ln(N)+γ)γ

For general α1:

C1α1αhN1α1αC(hN1α)1/(1α)=h1/(1α)N

TypeScript: Required Cache Size Calculator

typescript
function requiredCacheSize(
  totalKeys: number,     // N
  targetHitRate: number, // h (0.0 to 1.0)
  alpha: number          // Zipf exponent
): number {
  if (alpha === 1.0) {
    const gamma = 0.5772156649;
    const lnN = Math.log(totalKeys) + gamma;
    return Math.ceil(Math.exp(targetHitRate * lnN - gamma));
  }

  // General case: C ≈ h^(1/(1-alpha)) * N
  const exponent = 1 / (1 - alpha);
  const cacheSize = Math.pow(targetHitRate, exponent) * totalKeys;
  return Math.ceil(Math.min(cacheSize, totalKeys));
}

// Examples: N = 1,000,000
console.log(requiredCacheSize(1_000_000, 0.90, 1.0));
// ≈ 126,000 entries (12.6% of dataset)

console.log(requiredCacheSize(1_000_000, 0.95, 1.0));
// ≈ 282,000 entries (28.2% of dataset)

console.log(requiredCacheSize(1_000_000, 0.90, 1.2));
// ≈ 16,000 entries (1.6% of dataset)

console.log(requiredCacheSize(1_000_000, 0.95, 1.2));
// ≈ 40,000 entries (4.0% of dataset)

console.log(requiredCacheSize(1_000_000, 0.99, 1.2));
// ≈ 200,000 entries (20% of dataset)

Working Set Size Estimation

The working set is the set of keys actively being accessed within a given time window. Not all keys in your database are "active" — many may be dormant (old records, inactive users, archived data).

Estimating from Access Logs

typescript
interface AccessLog {
  key: string;
  timestamp: number;
}

function estimateWorkingSet(
  logs: AccessLog[],
  windowMs: number
): WorkingSetEstimate {
  const now = Date.now();
  const windowStart = now - windowMs;

  // Keys accessed within the window
  const activeKeys = new Set<string>();
  const keyCounts = new Map<string, number>();

  for (const log of logs) {
    if (log.timestamp >= windowStart) {
      activeKeys.add(log.key);
      keyCounts.set(log.key, (keyCounts.get(log.key) ?? 0) + 1);
    }
  }

  // Sort by frequency to compute cumulative distribution
  const sorted = [...keyCounts.entries()].sort((a, b) => b[1] - a[1]);
  const totalAccesses = sorted.reduce((sum, [_, count]) => sum + count, 0);

  // Find the number of keys covering 80%, 90%, 95% of accesses
  let cumulative = 0;
  let p80 = 0, p90 = 0, p95 = 0;

  for (let i = 0; i < sorted.length; i++) {
    cumulative += sorted[i][1];
    const fraction = cumulative / totalAccesses;

    if (p80 === 0 && fraction >= 0.80) p80 = i + 1;
    if (p90 === 0 && fraction >= 0.90) p90 = i + 1;
    if (p95 === 0 && fraction >= 0.95) p95 = i + 1;
  }

  return {
    totalUniqueKeys: activeKeys.size,
    keysFor80Percent: p80,
    keysFor90Percent: p90,
    keysFor95Percent: p95,
    totalAccesses,
  };
}

interface WorkingSetEstimate {
  totalUniqueKeys: number;
  keysFor80Percent: number;
  keysFor90Percent: number;
  keysFor95Percent: number;
  totalAccesses: number;
}

Time-Varying Working Sets

Working sets change over time:

  • Daily cycles: E-commerce sites have different popular products during business hours vs nighttime
  • Weekly cycles: Some content is popular on weekdays, other on weekends
  • Seasonal: Holiday products are popular in December, tax software in April
  • Events: Breaking news, viral content, flash sales create temporary hot keys

Your cache must be sized for the peak working set, not the average. Measure during your peak traffic period.


Memory Budget Calculation

Knowing the required number of entries is only half the answer. You also need to know how much memory each entry consumes.

Redis Memory Overhead Per Key

Redis has per-key overhead beyond the raw key and value data:

ComponentOverhead (bytes)Notes
Redis dict entry64Hash table entry (3 pointers + metadata)
Redis object header16Type, encoding, refcount, LRU/LFU info
SDS string header (key)9-17Simple Dynamic String header
SDS string header (value)9-17Depends on string length
Expire entry (if TTL set)64Separate dict entry for expiration
Total overhead per key~160-180 bytesPlus actual key and value data

Total Memory Formula

For C cache entries, each with key size k bytes and value size v bytes:

Total Memory=C×(k+v+O)

where O is the per-key overhead (approximately 170 bytes for Redis with TTL).

TypeScript: Memory Budget Calculator

typescript
interface MemoryEstimate {
  totalBytes: number;
  totalMB: number;
  totalGB: number;
  perKeyBytes: number;
  breakdown: {
    keyData: number;
    valueData: number;
    redisOverhead: number;
  };
}

function estimateRedisMemory(
  entryCount: number,
  avgKeySize: number,      // bytes
  avgValueSize: number,    // bytes
  hasTTL: boolean = true
): MemoryEstimate {
  const redisOverhead = hasTTL ? 178 : 114; // Approximate per-key overhead
  const perKeyBytes = avgKeySize + avgValueSize + redisOverhead;
  const totalBytes = entryCount * perKeyBytes;

  // Add Redis hash table overhead (fills to ~50% capacity)
  const hashTableOverhead = totalBytes * 0.10;
  const totalWithOverhead = totalBytes + hashTableOverhead;

  return {
    totalBytes: totalWithOverhead,
    totalMB: totalWithOverhead / (1024 * 1024),
    totalGB: totalWithOverhead / (1024 * 1024 * 1024),
    perKeyBytes,
    breakdown: {
      keyData: entryCount * avgKeySize,
      valueData: entryCount * avgValueSize,
      redisOverhead: entryCount * redisOverhead + hashTableOverhead,
    },
  };
}

// Example: 1 million entries, 30-byte keys, 500-byte values
const estimate = estimateRedisMemory(1_000_000, 30, 500, true);
console.log(`Total: ${estimate.totalGB.toFixed(2)} GB`);
// Total: ~0.72 GB

console.log(`Per key: ${estimate.perKeyBytes} bytes`);
// Per key: ~708 bytes (30 + 500 + 178)

// Example: 10 million entries, 40-byte keys, 200-byte JSON values
const estimate2 = estimateRedisMemory(10_000_000, 40, 200, true);
console.log(`Total: ${estimate2.totalGB.toFixed(2)} GB`);
// Total: ~4.28 GB

Redis Data Type Memory Comparison

Different Redis data types have different memory characteristics:

Data TypePer-Entry OverheadBest For
String~60 bytesSimple key-value pairs
Hash (ziplist encoding)~10 bytes per field (up to 128 fields/64 bytes per field)Small objects with many fields
Hash (hashtable encoding)~100 bytes per fieldLarge objects
List (listpack)~10 bytes per elementSmall lists
List (quicklist)~30 bytes per elementLarge lists
Set (listpack)~10 bytes per elementSmall sets
Set (hashtable)~70 bytes per elementLarge sets
Sorted Set (listpack)~20 bytes per elementSmall sorted sets
Sorted Set (skiplist)~120 bytes per elementLarge sorted sets

Memory Optimization

For caching many small objects (e.g., user sessions with 5-10 fields), using Redis Hashes with fewer than 128 fields (ziplist encoding) uses approximately 10x less memory than storing each field as a separate String key. Set hash-max-ziplist-entries 128 and hash-max-ziplist-value 64 in Redis config.


Practical Sizing Examples

Example 1: E-Commerce Product Cache

Given:

  • 2 million products in the database
  • 50,000 unique products viewed per day (working set)
  • Access pattern: α1.1 (measured from logs)
  • Average product JSON: 800 bytes
  • Average key: 25 bytes (product:1234567)
  • Target hit rate: 95%

Required cache entries:

C=h1/(1α)N=0.951/(11.1)50,000=0.951050,000C=0.951050,0001.629×50,00081,450 entries

Wait — that's more than the working set. Let's use the working set (50,000) as our N since dormant products are never requested:

C0.9510×50,000=81,450

Since C>N, we'd need to cache the entire working set. But that's only 50,000 keys, which is very manageable.

Memory estimate:

Memory=50,000×(25+800+178)×1.10=50,000×1,003×1.1055 MB

A 55 MB cache. Trivial. Even a 256 MB Redis instance would be overkill.

Example 2: Social Media Feed Cache

Given:

  • 100 million users
  • 5 million daily active users (DAU)
  • Each user's feed is personalized (unique per user)
  • Average feed JSON: 10 KB
  • Average key: 20 bytes (feed:12345678)
  • Access pattern: α0.9 (celebrity accounts dominate)
  • Target hit rate: 90%

Required cache entries:

With N=5,000,000 (DAU as working set) and α=0.9:

C=0.901/(10.9)5,000,000=0.90105,000,000C0.3487×5,000,000=1,743,500 entries

Memory estimate:

Memory=1,743,500×(20+10,240+178)×1.1020 GB

A 20 GB cache. This requires a medium Redis instance or a Redis Cluster.

Example 3: Session Storage

Given:

  • 2 million concurrent sessions
  • All sessions are active (uniform access)
  • Average session: 2 KB
  • Average key: 42 bytes (UUID)
  • Target: cache ALL active sessions (100% hit rate)

Memory estimate:

Memory=2,000,000×(42+2,048+178)×1.104.99 GB

A 5 GB cache. This is an exact requirement since all sessions must be cached.


The Diminishing Returns Curve

Adding more cache capacity has diminishing returns. The marginal hit rate improvement per additional cache entry decreases as the cache grows.

The marginal hit rate from adding the (C+1)-th entry:

Δh=P(C+1)HN,α=1(C+1)αHN,α

For α=1.0 and N=1,000,000:

Cache SizeHit RateMarginal Improvement per 1000 entries
1,00041.0%7.2%
10,00064.0%0.72%
100,00083.7%0.072%
500,00095.0%0.014%

After 100,000 entries, each additional 1,000 entries improves hit rate by only 0.072%. The cost-per-percentage-point of hit rate increases exponentially.

Optimal Cache Size (Cost-Minimizing)

The optimal cache size minimizes total cost:

Total Cost=Ccmem+(1h(C))Rcmiss

where:

  • cmem = cost per cached entry per unit time (memory cost)
  • R = request rate
  • cmiss = cost per cache miss (DB query time × cost per second)

Taking the derivative and setting to zero:

cmem=RcmissdhdC=Rcmiss1(C+1)αHN,α

Solving for C:

C=(RcmisscmemHN,α)1/α1

This is the cache size where the marginal cost of one more cached entry equals the marginal savings from one fewer cache miss.

War Story

The 10x Overprovisioned Cache (Fintech, 2022)

A fintech company provisioned 256 GB of Redis for their transaction cache. Their working set was 12 GB. They were spending 3,200/monthonRediscapacitytheydidntneed.Theoperationsteamwasafraidtodownsizebecause"whatifweneedit."AproperZipfanalysisshowedthatwiththeiraccesspattern(\alpha = 1.15$), a 32 GB instance would achieve a 99.2% hit rate — versus 99.7% at 256 GB. The 0.5% difference was not worth $2,800/month. They downsized to 64 GB (with headroom) and saved $2,400/month.

War Story

The Undersized Cache That Caused Cascading Failure (E-Commerce, 2023)

An e-commerce platform sized their Redis cache at 8 GB based on "average" traffic. During a sale event, the working set grew from 200,000 products to 2 million products. The 8 GB cache could hold approximately 300,000 products. The hit rate dropped from 92% to 31%. The database received 7x its normal load. Connection pools were exhausted. The database went down, taking the site offline.

The post-mortem led to two changes: (1) cache sizing based on peak working set, not average, and (2) Redis memory monitoring with alerts at 70% and 90% utilization.

Advanced: Time-Aware Sizing

The standard model assumes a static access pattern. In reality, the working set changes over time. Time-aware sizing accounts for this:

Crequired(t)=f(W(t),α(t),htarget)

where W(t) is the working set size at time t and α(t) is the skew at time t.

typescript
interface TimeSizingResult {
  peakCacheSize: number;
  averageCacheSize: number;
  recommendedSize: number; // Peak + 20% headroom
  peakTime: string;
}

function timeAwareSizing(
  hourlyWorkingSets: Array<{ hour: number; uniqueKeys: number; alpha: number }>,
  targetHitRate: number,
  avgEntryBytes: number
): TimeSizingResult {
  let peakSize = 0;
  let peakHour = 0;
  let totalSize = 0;

  for (const { hour, uniqueKeys, alpha } of hourlyWorkingSets) {
    const required = requiredCacheSize(uniqueKeys, targetHitRate, alpha);
    totalSize += required;

    if (required > peakSize) {
      peakSize = required;
      peakHour = hour;
    }
  }

  return {
    peakCacheSize: peakSize,
    averageCacheSize: Math.round(totalSize / hourlyWorkingSets.length),
    recommendedSize: Math.ceil(peakSize * 1.2), // 20% headroom
    peakTime: `${peakHour}:00`,
  };
}

Advanced: Cache Efficiency Metric

Cache efficiency measures how well you're using your cache memory:

η=hactualhoptimal(C)

where hoptimal(C) is the hit rate of an optimal (Belady's) cache of the same size.

  • η=1.0: Your cache is perfectly efficient (LRU ≈ optimal)
  • η=0.8: Your eviction policy is wasting 20% of cache capacity
  • η<0.5: Something is seriously wrong (scan pollution, poor key design, etc.)

If efficiency is low, the problem is not the cache size — it's the eviction policy or access pattern. Adding more memory won't help as much as fixing the policy.

"What I cannot create, I do not understand." — Richard Feynman