Skip to content
Unverified — AI-generated content. Help verify this page

Math Patterns in System Design

System design interviews are not about memorizing architectures. They are about reasoning under uncertainty — and that reasoning is fundamentally mathematical. "How much storage do we need?" "What's our QPS?" "How many servers?" These questions demand quick, principled back-of-envelope calculations. Getting these numbers right (or at least right order-of-magnitude) is what separates a hand-wavy answer from a credible design.

Back-of-Envelope Estimation Framework

Every estimation follows the same structure:

  1. Define what you are estimating (storage, bandwidth, QPS, latency, cost)
  2. Identify the key variables (users, requests per user, data per request)
  3. Estimate each variable (use round numbers, powers of 10)
  4. Multiply through (chain the estimates)
  5. Sanity check (does the result make sense? Compare to known systems)

TIP

Interviewers care about your process, not exact numbers. An estimate that is off by 2x but follows sound reasoning is far better than a "correct" number pulled from nowhere. Always show your work.

Powers of 2 Table

Memorize this table. It is the foundation of every system design calculation.

PowerExact ValueApproxName
2101,0241 Thousand1 KB
2201,048,5761 Million1 MB
2301,073,741,8241 Billion1 GB
2401,099,511,627,7761 Trillion1 TB
250~1.13 Quadrillion1 Quadrillion1 PB

Useful Approximations

210103,220106,230109,2401012

This means:

  • 1 KB 1,000 bytes
  • 1 MB 1,000 KB 106 bytes
  • 1 GB 1,000 MB 109 bytes
  • 1 TB 1,000 GB 1012 bytes

Data Size Reference

Data TypeTypical Size
char / ASCII1 byte
Unicode character (UTF-8)1-4 bytes
32-bit integer4 bytes
64-bit integer / double8 bytes
UUID16 bytes
Timestamp (epoch)8 bytes
MD5 hash16 bytes
SHA-256 hash32 bytes
IPv4 address4 bytes
IPv6 address16 bytes
Average tweet~300 bytes
Average email~50 KB
Average web page~2 MB
Average photo (compressed)~200 KB
Average 1-min video (720p)~5 MB

Time Constants

DurationSecondsUseful For
1 minute60Request rates
1 hour3,600Batch processing
1 day86,400 105Daily aggregates
1 month2,592,000 2.5×106Monthly storage
1 year31,536,000 3×107Capacity planning

TIP

The 86,400 trick: A day has 86,400 seconds. Round to 105 for quick math. This means:

  • 1 QPS = ~100K requests/day
  • 1K QPS = ~100M requests/day
  • 10K QPS = ~1B requests/day

QPS (Queries Per Second) Calculations

From Daily Active Users

QPS=DAU×requests per user per day86,400

Example: Twitter-like service

DAU=300M,tweets viewed per user=100Read QPS=300×106×10086,4003×1010105=300,000 QPS

Peak vs Average

Production systems must handle peak load, not just average. Common rule of thumb:

Peak QPS=2--5×Average QPS

From Concurrent Users

QPS=Concurrent UsersAverage Request Duration (seconds)

TypeScript (estimation helper):

typescript
interface EstimationParams {
  dau: number;
  requestsPerUserPerDay: number;
  peakMultiplier?: number;
}

function estimateQPS(params: EstimationParams): {
  averageQPS: number;
  peakQPS: number;
  dailyRequests: number;
} {
  const { dau, requestsPerUserPerDay, peakMultiplier = 3 } = params;
  const dailyRequests = dau * requestsPerUserPerDay;
  const averageQPS = dailyRequests / 86_400;
  const peakQPS = averageQPS * peakMultiplier;

  return {
    averageQPS: Math.round(averageQPS),
    peakQPS: Math.round(peakQPS),
    dailyRequests,
  };
}

// Example: Social media
const result = estimateQPS({
  dau: 300_000_000,
  requestsPerUserPerDay: 100,
  peakMultiplier: 3,
});
// { averageQPS: 347222, peakQPS: 1041667, dailyRequests: 30000000000 }

Python:

python
from dataclasses import dataclass

@dataclass
class QPSEstimate:
    average_qps: int
    peak_qps: int
    daily_requests: int

def estimate_qps(
    dau: int,
    requests_per_user_per_day: int,
    peak_multiplier: float = 3.0
) -> QPSEstimate:
    daily_requests = dau * requests_per_user_per_day
    average_qps = daily_requests / 86_400
    peak_qps = average_qps * peak_multiplier

    return QPSEstimate(
        average_qps=round(average_qps),
        peak_qps=round(peak_qps),
        daily_requests=daily_requests
    )

# Example
result = estimate_qps(dau=300_000_000, requests_per_user_per_day=100)
print(f"Average: {result.average_qps:,} QPS")
print(f"Peak: {result.peak_qps:,} QPS")

Storage Calculations

General Formula

Total Storage=Daily New Data×Retention Period (days)×Replication Factor

Worked Example: URL Shortener

Assumptions:

  • 100M new URLs per day
  • Each URL record: 500 bytes (short URL + long URL + metadata)
  • Retention: 5 years
  • Replication factor: 3
Daily data=100×106×500=50×109=50 GB/day5-year storage=50 GB×365×5=91,250 GB91 TBWith replication=91 TB×3=273 TB

Worked Example: Chat Application

ParameterValue
DAU50M
Messages per user per day40
Average message size200 bytes
Media messages (10% with 200KB avg)200 KB
RetentionForever
Text per day=50×106×40×200=400 GBMedia per day=50×106×40×0.1×200,000=40 TBTotal per year(0.4+40)×36514,746 TB14.7 PB

Bandwidth Calculations

Bandwidth=QPS×Average Response Size

Example: If your service handles 10K QPS and each response averages 10 KB:

Bandwidth=10,000×10 KB=100,000 KB/s=100 MB/s800 Mbps

Bandwidth Conversion

UnitEquivalent
1 Byte8 bits
1 MB/s8 Mbps
1 Gbps125 MB/s
10 Gbps1.25 GB/s

Consistent Hashing Math

Consistent hashing maps both keys and servers onto a ring of size 2m (typically m=128 or m=160 for MD5/SHA-1). Each key is assigned to the next server clockwise on the ring.

Virtual Nodes

With N physical servers, each server gets V virtual nodes on the ring. This improves load distribution.

Standard deviation of load1V

With V=150 virtual nodes per server, the standard deviation of load drops to about 5-10% of the mean — acceptable for most systems.

Rebalancing on Server Addition/Removal

When a server is added, only KN keys need to be remapped (where K is total keys, N is total servers). Compare this to modular hashing where nearly all keys must move:

EventConsistent HashingModular Hashing
Add 1 server (N to N+1)KN+1 keys moveNN+1K keys move
Remove 1 serverKN keys moveN1NK keys move

TypeScript:

typescript
import { createHash } from "crypto";

class ConsistentHash {
  private ring: Map<number, string> = new Map();
  private sortedKeys: number[] = [];
  private virtualNodes: number;

  constructor(virtualNodes = 150) {
    this.virtualNodes = virtualNodes;
  }

  private hash(key: string): number {
    const h = createHash("md5").update(key).digest();
    return h.readUInt32BE(0);
  }

  addServer(server: string): void {
    for (let i = 0; i < this.virtualNodes; i++) {
      const virtualKey = this.hash(`${server}:${i}`);
      this.ring.set(virtualKey, server);
      this.sortedKeys.push(virtualKey);
    }
    this.sortedKeys.sort((a, b) => a - b);
  }

  removeServer(server: string): void {
    for (let i = 0; i < this.virtualNodes; i++) {
      const virtualKey = this.hash(`${server}:${i}`);
      this.ring.delete(virtualKey);
    }
    this.sortedKeys = this.sortedKeys.filter((k) => this.ring.has(k));
  }

  getServer(key: string): string | undefined {
    if (this.sortedKeys.length === 0) return undefined;

    const h = this.hash(key);
    // Binary search for next server clockwise
    let lo = 0, hi = this.sortedKeys.length;
    while (lo < hi) {
      const mid = (lo + hi) >> 1;
      if (this.sortedKeys[mid] < h) lo = mid + 1;
      else hi = mid;
    }

    const idx = lo % this.sortedKeys.length;
    return this.ring.get(this.sortedKeys[idx]);
  }
}

Python:

python
import hashlib
from bisect import bisect_right

class ConsistentHash:
    def __init__(self, virtual_nodes: int = 150):
        self.virtual_nodes = virtual_nodes
        self.ring: dict[int, str] = {}
        self.sorted_keys: list[int] = []

    def _hash(self, key: str) -> int:
        h = hashlib.md5(key.encode()).digest()
        return int.from_bytes(h[:4], "big")

    def add_server(self, server: str) -> None:
        for i in range(self.virtual_nodes):
            virtual_key = self._hash(f"{server}:{i}")
            self.ring[virtual_key] = server
        self.sorted_keys = sorted(self.ring.keys())

    def remove_server(self, server: str) -> None:
        for i in range(self.virtual_nodes):
            virtual_key = self._hash(f"{server}:{i}")
            self.ring.pop(virtual_key, None)
        self.sorted_keys = sorted(self.ring.keys())

    def get_server(self, key: str) -> str | None:
        if not self.sorted_keys:
            return None

        h = self._hash(key)
        idx = bisect_right(self.sorted_keys, h)
        if idx == len(self.sorted_keys):
            idx = 0
        return self.ring[self.sorted_keys[idx]]

Replication Factor Calculations

Write Amplification

With a replication factor of R:

Write throughput per node=Total write QPS1(leader-based)Total write I/O=Write QPS×R

Quorum Reads and Writes

For a system with N replicas, a write quorum W, and a read quorum R:

W+R>Nstrong consistency guaranteed

Common configurations:

ConfigNWRTrade-off
Strong consistency322Balanced
Write-optimized313Fast writes, slow reads
Read-optimized331Slow writes, fast reads
Eventual consistency311Fast but stale reads possible

Availability Calculation

For N independent replicas, each with availability a:

System availability=1(1a)N
Single node availability2 replicas3 replicas5 replicas
99% (3.65 days downtime)99.99%99.9999%99.99999999%
99.9% (8.77 hrs downtime)99.9999%~100%~100%
99.99% (52.6 min downtime)~100%~100%~100%

WARNING

These calculations assume independent failures. In practice, correlated failures (data center outages, shared dependencies) make real availability lower. This is why multi-region deployment matters.

CAP Theorem: Formal Intuition

The CAP theorem states that a distributed system can provide at most two of three guarantees simultaneously:

  • Consistency (C): Every read receives the most recent write
  • Availability (A): Every request receives a response (not an error)
  • Partition tolerance (P): The system continues to operate despite network partitions

Why You Must Choose

During a network partition, a message from node A cannot reach node B. You have two options:

  1. Choose Consistency (CP): Refuse to respond until the partition heals. You lose availability.
  2. Choose Availability (AP): Respond with potentially stale data. You lose consistency.

Since network partitions are inevitable in distributed systems, the real choice is between CP and AP during partition events.

PACELC Extension

PACELC extends CAP: if there is a Partition, choose A or C; Else (normal operation), choose Latency or Consistency.

SystemDuring PartitionNormal Operation
DynamoDBAPEL (low latency)
MongoDBCPEC (consistent)
CassandraAPEL (tunable)
PostgreSQLCPEC (ACID)

Estimation Cheat Sheet

Common Service Numbers

ServiceDAUPeak QPSStorage/day
Twitter-scale300M~300K~15 TB
Instagram-scale500M~500K~100 TB (images)
WhatsApp-scale2B~1M~50 TB
YouTube-scale2B~200K~500 TB (video)
URL shortener100M~10K~50 GB

Server Capacity Rules of Thumb

ResourceSingle ServerNotes
QPS (web server)1,000-10,000Depends on response time
QPS (database)1,000-5,000With indexing
QPS (cache - Redis)100,000+In-memory
RAM64-256 GBCommodity
Disk1-10 TB SSDFast I/O
Network1-10 GbpsStandard NIC

Quick Estimation Formulas

Servers needed=Peak QPSQPS per serverCache size=Working set20%×Total dataRead/Write ratio (social media)100:1CDN hit rate90--95%

Practice Estimation Problems

ProblemKey VariablesExpected Answer
"Estimate YouTube storage per day"500 hrs video/min, 5 MB/sec, resolutions~2.5 PB/day
"How many servers for 1M QPS?"5K QPS/server~200 servers
"Twitter fan-out storage"300M users, 500 avg followers, 200 tweets/day~3 TB write amplification/day
"Cache size for 100M users"20% active, 1 KB per session~20 GB

TIP

When in doubt during an interview, round aggressively and state your assumptions clearly. "105 seconds in a day" is close enough and much easier to multiply than 86,400. Interviewers value clarity of reasoning over arithmetic precision.

Further Reading

"What I cannot create, I do not understand." — Richard Feynman