Math Patterns in System Design
System design interviews are not about memorizing architectures. They are about reasoning under uncertainty — and that reasoning is fundamentally mathematical. "How much storage do we need?" "What's our QPS?" "How many servers?" These questions demand quick, principled back-of-envelope calculations. Getting these numbers right (or at least right order-of-magnitude) is what separates a hand-wavy answer from a credible design.
Back-of-Envelope Estimation Framework
Every estimation follows the same structure:
- Define what you are estimating (storage, bandwidth, QPS, latency, cost)
- Identify the key variables (users, requests per user, data per request)
- Estimate each variable (use round numbers, powers of 10)
- Multiply through (chain the estimates)
- Sanity check (does the result make sense? Compare to known systems)
TIP
Interviewers care about your process, not exact numbers. An estimate that is off by 2x but follows sound reasoning is far better than a "correct" number pulled from nowhere. Always show your work.
Powers of 2 Table
Memorize this table. It is the foundation of every system design calculation.
| Power | Exact Value | Approx | Name |
|---|---|---|---|
| 1,024 | 1 Thousand | 1 KB | |
| 1,048,576 | 1 Million | 1 MB | |
| 1,073,741,824 | 1 Billion | 1 GB | |
| 1,099,511,627,776 | 1 Trillion | 1 TB | |
| ~1.13 Quadrillion | 1 Quadrillion | 1 PB |
Useful Approximations
This means:
- 1 KB
1,000 bytes - 1 MB
1,000 KB bytes - 1 GB
1,000 MB bytes - 1 TB
1,000 GB bytes
Data Size Reference
| Data Type | Typical Size |
|---|---|
| char / ASCII | 1 byte |
| Unicode character (UTF-8) | 1-4 bytes |
| 32-bit integer | 4 bytes |
| 64-bit integer / double | 8 bytes |
| UUID | 16 bytes |
| Timestamp (epoch) | 8 bytes |
| MD5 hash | 16 bytes |
| SHA-256 hash | 32 bytes |
| IPv4 address | 4 bytes |
| IPv6 address | 16 bytes |
| Average tweet | ~300 bytes |
| Average email | ~50 KB |
| Average web page | ~2 MB |
| Average photo (compressed) | ~200 KB |
| Average 1-min video (720p) | ~5 MB |
Time Constants
| Duration | Seconds | Useful For |
|---|---|---|
| 1 minute | 60 | Request rates |
| 1 hour | 3,600 | Batch processing |
| 1 day | 86,400 | Daily aggregates |
| 1 month | 2,592,000 | Monthly storage |
| 1 year | 31,536,000 | Capacity planning |
TIP
The 86,400 trick: A day has 86,400 seconds. Round to
- 1 QPS = ~100K requests/day
- 1K QPS = ~100M requests/day
- 10K QPS = ~1B requests/day
QPS (Queries Per Second) Calculations
From Daily Active Users
Example: Twitter-like service
Peak vs Average
Production systems must handle peak load, not just average. Common rule of thumb:
From Concurrent Users
TypeScript (estimation helper):
interface EstimationParams {
dau: number;
requestsPerUserPerDay: number;
peakMultiplier?: number;
}
function estimateQPS(params: EstimationParams): {
averageQPS: number;
peakQPS: number;
dailyRequests: number;
} {
const { dau, requestsPerUserPerDay, peakMultiplier = 3 } = params;
const dailyRequests = dau * requestsPerUserPerDay;
const averageQPS = dailyRequests / 86_400;
const peakQPS = averageQPS * peakMultiplier;
return {
averageQPS: Math.round(averageQPS),
peakQPS: Math.round(peakQPS),
dailyRequests,
};
}
// Example: Social media
const result = estimateQPS({
dau: 300_000_000,
requestsPerUserPerDay: 100,
peakMultiplier: 3,
});
// { averageQPS: 347222, peakQPS: 1041667, dailyRequests: 30000000000 }Python:
from dataclasses import dataclass
@dataclass
class QPSEstimate:
average_qps: int
peak_qps: int
daily_requests: int
def estimate_qps(
dau: int,
requests_per_user_per_day: int,
peak_multiplier: float = 3.0
) -> QPSEstimate:
daily_requests = dau * requests_per_user_per_day
average_qps = daily_requests / 86_400
peak_qps = average_qps * peak_multiplier
return QPSEstimate(
average_qps=round(average_qps),
peak_qps=round(peak_qps),
daily_requests=daily_requests
)
# Example
result = estimate_qps(dau=300_000_000, requests_per_user_per_day=100)
print(f"Average: {result.average_qps:,} QPS")
print(f"Peak: {result.peak_qps:,} QPS")Storage Calculations
General Formula
Worked Example: URL Shortener
Assumptions:
- 100M new URLs per day
- Each URL record: 500 bytes (short URL + long URL + metadata)
- Retention: 5 years
- Replication factor: 3
Worked Example: Chat Application
| Parameter | Value |
|---|---|
| DAU | 50M |
| Messages per user per day | 40 |
| Average message size | 200 bytes |
| Media messages (10% with 200KB avg) | 200 KB |
| Retention | Forever |
Bandwidth Calculations
Example: If your service handles 10K QPS and each response averages 10 KB:
Bandwidth Conversion
| Unit | Equivalent |
|---|---|
| 1 Byte | 8 bits |
| 1 MB/s | 8 Mbps |
| 1 Gbps | 125 MB/s |
| 10 Gbps | 1.25 GB/s |
Consistent Hashing Math
Consistent hashing maps both keys and servers onto a ring of size
Virtual Nodes
With
With
Rebalancing on Server Addition/Removal
When a server is added, only
| Event | Consistent Hashing | Modular Hashing |
|---|---|---|
| Add 1 server ( | ||
| Remove 1 server |
TypeScript:
import { createHash } from "crypto";
class ConsistentHash {
private ring: Map<number, string> = new Map();
private sortedKeys: number[] = [];
private virtualNodes: number;
constructor(virtualNodes = 150) {
this.virtualNodes = virtualNodes;
}
private hash(key: string): number {
const h = createHash("md5").update(key).digest();
return h.readUInt32BE(0);
}
addServer(server: string): void {
for (let i = 0; i < this.virtualNodes; i++) {
const virtualKey = this.hash(`${server}:${i}`);
this.ring.set(virtualKey, server);
this.sortedKeys.push(virtualKey);
}
this.sortedKeys.sort((a, b) => a - b);
}
removeServer(server: string): void {
for (let i = 0; i < this.virtualNodes; i++) {
const virtualKey = this.hash(`${server}:${i}`);
this.ring.delete(virtualKey);
}
this.sortedKeys = this.sortedKeys.filter((k) => this.ring.has(k));
}
getServer(key: string): string | undefined {
if (this.sortedKeys.length === 0) return undefined;
const h = this.hash(key);
// Binary search for next server clockwise
let lo = 0, hi = this.sortedKeys.length;
while (lo < hi) {
const mid = (lo + hi) >> 1;
if (this.sortedKeys[mid] < h) lo = mid + 1;
else hi = mid;
}
const idx = lo % this.sortedKeys.length;
return this.ring.get(this.sortedKeys[idx]);
}
}Python:
import hashlib
from bisect import bisect_right
class ConsistentHash:
def __init__(self, virtual_nodes: int = 150):
self.virtual_nodes = virtual_nodes
self.ring: dict[int, str] = {}
self.sorted_keys: list[int] = []
def _hash(self, key: str) -> int:
h = hashlib.md5(key.encode()).digest()
return int.from_bytes(h[:4], "big")
def add_server(self, server: str) -> None:
for i in range(self.virtual_nodes):
virtual_key = self._hash(f"{server}:{i}")
self.ring[virtual_key] = server
self.sorted_keys = sorted(self.ring.keys())
def remove_server(self, server: str) -> None:
for i in range(self.virtual_nodes):
virtual_key = self._hash(f"{server}:{i}")
self.ring.pop(virtual_key, None)
self.sorted_keys = sorted(self.ring.keys())
def get_server(self, key: str) -> str | None:
if not self.sorted_keys:
return None
h = self._hash(key)
idx = bisect_right(self.sorted_keys, h)
if idx == len(self.sorted_keys):
idx = 0
return self.ring[self.sorted_keys[idx]]Replication Factor Calculations
Write Amplification
With a replication factor of
Quorum Reads and Writes
For a system with
Common configurations:
| Config | Trade-off | |||
|---|---|---|---|---|
| Strong consistency | 3 | 2 | 2 | Balanced |
| Write-optimized | 3 | 1 | 3 | Fast writes, slow reads |
| Read-optimized | 3 | 3 | 1 | Slow writes, fast reads |
| Eventual consistency | 3 | 1 | 1 | Fast but stale reads possible |
Availability Calculation
For
| Single node availability | 2 replicas | 3 replicas | 5 replicas |
|---|---|---|---|
| 99% (3.65 days downtime) | 99.99% | 99.9999% | 99.99999999% |
| 99.9% (8.77 hrs downtime) | 99.9999% | ~100% | ~100% |
| 99.99% (52.6 min downtime) | ~100% | ~100% | ~100% |
WARNING
These calculations assume independent failures. In practice, correlated failures (data center outages, shared dependencies) make real availability lower. This is why multi-region deployment matters.
CAP Theorem: Formal Intuition
The CAP theorem states that a distributed system can provide at most two of three guarantees simultaneously:
- Consistency (C): Every read receives the most recent write
- Availability (A): Every request receives a response (not an error)
- Partition tolerance (P): The system continues to operate despite network partitions
Why You Must Choose
During a network partition, a message from node A cannot reach node B. You have two options:
- Choose Consistency (CP): Refuse to respond until the partition heals. You lose availability.
- Choose Availability (AP): Respond with potentially stale data. You lose consistency.
Since network partitions are inevitable in distributed systems, the real choice is between CP and AP during partition events.
PACELC Extension
PACELC extends CAP: if there is a Partition, choose A or C; Else (normal operation), choose Latency or Consistency.
| System | During Partition | Normal Operation |
|---|---|---|
| DynamoDB | AP | EL (low latency) |
| MongoDB | CP | EC (consistent) |
| Cassandra | AP | EL (tunable) |
| PostgreSQL | CP | EC (ACID) |
Estimation Cheat Sheet
Common Service Numbers
| Service | DAU | Peak QPS | Storage/day |
|---|---|---|---|
| Twitter-scale | 300M | ~300K | ~15 TB |
| Instagram-scale | 500M | ~500K | ~100 TB (images) |
| WhatsApp-scale | 2B | ~1M | ~50 TB |
| YouTube-scale | 2B | ~200K | ~500 TB (video) |
| URL shortener | 100M | ~10K | ~50 GB |
Server Capacity Rules of Thumb
| Resource | Single Server | Notes |
|---|---|---|
| QPS (web server) | 1,000-10,000 | Depends on response time |
| QPS (database) | 1,000-5,000 | With indexing |
| QPS (cache - Redis) | 100,000+ | In-memory |
| RAM | 64-256 GB | Commodity |
| Disk | 1-10 TB SSD | Fast I/O |
| Network | 1-10 Gbps | Standard NIC |
Quick Estimation Formulas
Practice Estimation Problems
| Problem | Key Variables | Expected Answer |
|---|---|---|
| "Estimate YouTube storage per day" | 500 hrs video/min, 5 MB/sec, resolutions | ~2.5 PB/day |
| "How many servers for 1M QPS?" | 5K QPS/server | ~200 servers |
| "Twitter fan-out storage" | 300M users, 500 avg followers, 200 tweets/day | ~3 TB write amplification/day |
| "Cache size for 100M users" | 20% active, 1 KB per session | ~20 GB |
TIP
When in doubt during an interview, round aggressively and state your assumptions clearly. "
Further Reading
- Consistent Hashing — full deep-dive on ring-based hashing
- Databases — storage engine internals
- Greedy Algorithms — optimization under constraints
- Bit Manipulation — bloom filter math
- Advanced Data Structures — data structure trade-offs