Skip to content
Unverified — AI-generated content. Help verify this page

Performance Benchmarks Reference

Every system design interview, every capacity planning exercise, and every "will this architecture work?" discussion comes down to numbers. Jeff Dean published his famous "Numbers Every Programmer Should Know" in 2009. Hardware has changed — SSDs replaced spinning disks, DDR5 replaced DDR3, NVMe replaced SATA — but the relative magnitudes remain surprisingly stable. This page is your reference for 2026 hardware and cloud infrastructure, organized for quick lookup and back-of-envelope estimation.

Latency Numbers Every Engineer Should Know (2026)

The Core Latency Hierarchy

Complete Latency Table

OperationLatencyRelative (L1 = 1)Notes
CPU: L1 cache reference~1 ns1xOn-die, per-core
CPU: Branch mispredict~3 ns3xPipeline flush penalty
CPU: L2 cache reference~4 ns4xOn-die, per-core
CPU: L3 cache reference~12 ns12xShared across cores
CPU: Mutex lock/unlock~17 ns17xUncontended
Memory: DDR5 reference~80 ns80xRandom access
Memory: NUMA remote node~120 ns120xCross-socket access
Storage: NVMe SSD random 4K read~10 μs10,000xPCIe Gen5
Storage: NVMe SSD sequential 1MB~30 μs30,000x~30 GB/s throughput
Storage: SATA SSD random 4K read~50 μs50,000xSATA III limited
Storage: SATA SSD sequential 1MB~200 μs200,000x~550 MB/s throughput
Network: Same-rack RTT~100 μs100,000xLeaf switch
Network: Same-datacenter RTT~500 μs500,000xSpine switch hops
Storage: HDD sequential 1MB~2 ms2,000,000x~200 MB/s
Storage: HDD seek~4 ms4,000,000xMechanical arm movement
Network: Cross-AZ RTT~1 ms1,000,000xSame region
Network: Cross-region RTT~30-100 ms30-100M xSpeed of light + routing
Network: US to Europe~70-90 ms70-90M xTransatlantic cable
Network: US to Australia~150-200 ms150-200M xTranspacific cable
Network: US to India~200-300 ms200-300M xMultiple hops

TIP

The key insight: every 3 levels in the hierarchy adds roughly 10x latency. L1 (1ns) to RAM (100ns) is ~100x. RAM to SSD is ~100x. SSD to network is ~50x. Network to cross-region is ~100x. Memorize the orders of magnitude, not the exact numbers.

Visual Scale: Powers of 10

1 ns    ██ L1 cache
4 ns    ████████ L2 cache
12 ns   ████████████████████████ L3 cache
80 ns   (bar would be 160 chars) Main memory
...
10 μs   (10,000 ns) NVMe SSD — this is 10,000x slower than L1
500 μs  (500,000 ns) Datacenter RTT — 500,000x slower than L1
100 ms  (100,000,000 ns) Cross-continent — 100,000,000x slower than L1

Throughput Numbers

Storage Throughput

Storage TypeSequential ReadSequential WriteRandom Read (4K IOPS)Random Write (4K IOPS)
DDR5 RAM~50 GB/s~50 GB/sN/A (ns latency)N/A
NVMe PCIe Gen5~12-14 GB/s~10-12 GB/s~2,000,000~1,500,000
NVMe PCIe Gen4~7 GB/s~5 GB/s~1,000,000~800,000
SATA SSD~550 MB/s~520 MB/s~100,000~90,000
HDD (7200 RPM)~200 MB/s~180 MB/s~150~150
Network (25 GbE)~3 GB/s~3 GB/sN/AN/A
Network (100 GbE)~12 GB/s~12 GB/sN/AN/A

WARNING

These are theoretical maximums. Real-world throughput depends on queue depth, block size, access patterns, filesystem overhead, and whether the drive is full. A "12 GB/s" NVMe drive might deliver 2-3 GB/s under typical mixed workloads.

Network Throughput

Network TypeBandwidthTypical ThroughputLatency
LoopbackUnlimited~50-80 Gbps~10 μs
Same-rack (25 GbE)25 Gbps~20 Gbps~100 μs
Same-datacenter (100 GbE)100 Gbps~80 Gbps~500 μs
Cross-AZ (AWS)~25 Gbps~5-10 Gbps~1 ms
Cross-regionVaries~1-5 Gbps~30-100 ms
Internet (broadband)~1 Gbps~500 Mbps~10-50 ms
Internet (mobile 5G)~1 Gbps~100-300 Mbps~10-30 ms
Internet (mobile 4G)~100 Mbps~20-50 Mbps~30-50 ms

Database Operation Costs

Read/Write Latency by Database Type

DatabaseRead (p50)Read (p99)Write (p50)Write (p99)Notes
Redis (in-memory)~0.1 ms~0.5 ms~0.1 ms~0.5 msSingle-node, local
Memcached~0.1 ms~0.5 ms~0.1 ms~0.3 msSimple key-value
PostgreSQL (indexed)~1 ms~5 ms~2 ms~10 msWith connection pooling
PostgreSQL (full scan)~50-500 ms~2 sN/AN/ADepends on table size
MySQL (indexed)~1 ms~5 ms~2 ms~10 msInnoDB, local SSD
MongoDB~1-2 ms~10 ms~2-5 ms~20 msWiredTiger, local
DynamoDB~5 ms~15 ms~10 ms~30 msEventually consistent
DynamoDB (strong)~10 ms~25 ms~10 ms~30 msStrongly consistent
Elasticsearch~10-50 ms~200 ms~50-200 ms~1 sDepends on index size
Cassandra~2-5 ms~15 ms~2-5 ms~15 msLOCAL_QUORUM
CockroachDB~5-10 ms~30 ms~10-20 ms~50 msDistributed SQL

Connection Costs

OperationTimeNotes
TCP handshake~1 RTT (~0.5 ms LAN)SYN, SYN-ACK, ACK
TLS 1.3 handshake~1 RTT (~0.5 ms LAN)1-RTT with TLS 1.3, 2-RTT with TLS 1.2
PostgreSQL connection~5-20 msAuth + SSL + setup
MySQL connection~5-15 msAuth + SSL
Connection pool checkout~0.01-0.1 msAlready established
DNS lookup (cached)~0.1 msOS resolver cache
DNS lookup (uncached)~20-100 msRecursive resolution

TIP

This is why connection pooling matters enormously. A fresh PostgreSQL connection costs ~10 ms. Checking out a pooled connection costs ~0.01 ms. That is a 1000x difference. Tools like PgBouncer, ProxySQL, or application-level pools (HikariCP, node-postgres pool) are not optional in production.

Cloud Service Latency

AWS Service Latency (Same Region, 2026)

ServiceOperationTypical LatencyNotes
S3GET (first byte)~20-50 msStandard class
S3PUT~50-100 msIncludes durability
S3 Express One ZoneGET~5-10 msSingle-digit ms
DynamoDBGetItem~5-10 msEventually consistent
ElastiCache (Redis)GET~0.2-1 msSame AZ
SQSSendMessage~5-20 msStandard queue
SQSReceiveMessage~5-20 msLong polling
SNSPublish~20-50 msFanout
LambdaCold start (Node.js)~100-300 ms128MB-1GB memory
LambdaCold start (Java)~500-3000 msSnapStart brings to ~200 ms
LambdaWarm invocation~1-5 msRuntime overhead only
API GatewayAdded latency~10-30 msREST API type
API Gateway (HTTP)Added latency~5-15 msHTTP API type
CloudFrontEdge hit~1-10 msCDN cache
RDS (PostgreSQL)Query (indexed)~2-5 msSame AZ
AuroraQuery (indexed)~2-5 msWriter instance
EBS (gp3)Random read~0.5-1 ms3000 baseline IOPS
EBS (io2)Random read~0.2-0.5 msProvisioned IOPS

Comparing Cloud Providers

OperationAWSGCPAzure
Object storage GETS3: ~30 msGCS: ~30 msBlob: ~30 ms
Key-value readDynamoDB: ~5 msBigtable: ~5 msCosmos DB: ~5 ms
Cache readElastiCache: ~0.5 msMemorystore: ~0.5 msAzure Cache: ~0.5 ms
Function cold startLambda: ~200 msCloud Run: ~300 msFunctions: ~500 ms
Message queueSQS: ~10 msPub/Sub: ~20 msService Bus: ~15 ms

CPU and Compute Benchmarks

Operations Per Second

OperationSpeedNotes
Simple arithmetic (add/multiply)~1 per ns (~1 GHz effective)Single core
SIMD vector operation~16 per nsAVX-512, 16 float32s
System call~1-2 μsContext switch to kernel
Context switch (threads)~1-5 μsSame process
Context switch (processes)~5-20 μsTLB flush
Hash (SHA-256, 64 bytes)~200 nsModern CPU, hardware accel
Hash (bcrypt, cost 10)~100 msIntentionally slow
Compress 1KB (zstd)~2 μsLevel 1
Compress 1KB (gzip)~10 μsLevel 6
JSON parse 1KB~5-10 μsV8 engine
JSON parse 1MB~5-10 msV8 engine
Regex match (simple)~0.1-1 μsCompiled regex
UUID v4 generation~100 nsCrypto random
JWT sign (RS256)~1 msRSA 2048-bit
JWT verify (RS256)~0.05 msRSA public key
TLS handshake~2-5 msRSA 2048, full handshake

Serialization/Deserialization

FormatSerialize 1KBDeserialize 1KBOutput Size (1KB input)
JSON~5 μs~5 μs~1.4 KB (text overhead)
Protocol Buffers~1 μs~1 μs~0.7 KB (binary)
MessagePack~2 μs~2 μs~0.8 KB (binary)
Avro~1.5 μs~1.5 μs~0.7 KB (with schema)
CBOR~2 μs~2 μs~0.9 KB (binary)

Back-of-Envelope Estimation

The Estimation Framework

Useful Powers of 2

PowerExact ValueApproximationCommon Use
2^101,024~1 Thousand1 KB
2^201,048,576~1 Million1 MB
2^301,073,741,824~1 Billion1 GB
2^401,099,511,627,776~1 Trillion1 TB
2^50~1.1 × 10^15~1 Quadrillion1 PB

Daily/Second Conversions

Daily VolumePer Second (QPS)Notes
100K~1 QPSSmall app
1M~12 QPSMedium app
10M~120 QPSLarge app
100M~1,200 QPSVery large app
1B~12,000 QPSTwitter/X scale
10B~120,000 QPSGoogle Search scale

Quick formula: Daily requests / 86,400 = average QPS. Peak QPS is typically 2-5x average.

Estimation Example: URL Shortener

Requirements:
- 100M new URLs/month
- Read:Write ratio = 100:1
- Store for 5 years
- Average URL length: 200 bytes

Traffic:
- Writes: 100M/month ÷ 30 ÷ 86400 = ~40 writes/sec
- Reads: 40 × 100 = ~4,000 reads/sec
- Peak reads: 4,000 × 3 = ~12,000 reads/sec

Storage:
- Per record: 200 bytes (URL) + 7 bytes (short code) + 8 bytes (timestamp)
  + 8 bytes (user_id) + overhead ≈ 300 bytes
- Total: 100M × 12 months × 5 years × 300 bytes
  = 6B records × 300 bytes = 1.8 TB

Bandwidth:
- Incoming: 40 writes/sec × 300 bytes = 12 KB/s (negligible)
- Outgoing: 4,000 reads/sec × 300 bytes = 1.2 MB/s (trivial)

Conclusion:
- 1.8 TB fits on a single machine's SSD easily
- 12K peak QPS is achievable with Redis caching for hot URLs
- A single PostgreSQL instance can handle 4K reads/sec with proper indexing
- This system can start as a single machine, shard later

TIP

In estimation exercises, round aggressively. 86,400 seconds/day becomes "~100K." 30 days/month is close enough. The goal is to get within an order of magnitude, not to compute exact values.

Common Capacity Rules of Thumb

ResourceRule of Thumb
Single serverCan handle ~10K-50K concurrent TCP connections
Single PostgreSQL~5K-10K simple queries/sec (indexed reads)
Single Redis~100K-200K operations/sec
Single Kafka broker~100K-500K messages/sec (small messages)
Nginx~50K-100K concurrent connections
Node.js (single process)~10K-30K HTTP requests/sec
Go HTTP server~50K-100K HTTP requests/sec
1 GB RAM~1M small objects (1KB each) or ~10M small strings
1 TB SSD~1B records at 1KB each

Estimation Example: Chat Application

Requirements:
- 50M DAU (daily active users)
- Average user sends 40 messages/day
- Average message size: 200 bytes (text) or 50 KB (with media metadata)
- Messages stored for 5 years
- 10% of messages include media (images/video stored in object storage)
- Read:Write ratio: 10:1 (users read more than they send)

Traffic:
- Messages sent: 50M × 40 = 2B messages/day
- Write QPS: 2B / 86,400 = ~23,000 writes/sec
- Peak write QPS: 23,000 × 3 = ~70,000 writes/sec
- Read QPS: 23,000 × 10 = ~230,000 reads/sec
- Peak read QPS: 230,000 × 3 = ~700,000 reads/sec

Storage (5 years):
- Text messages: 2B/day × 200 bytes × 365 × 5 = 730 TB
- Message metadata: 2B/day × 100 bytes × 365 × 5 = 365 TB
- Total DB storage: ~1.1 PB
- Media references: 200M media messages/day (10%)
  stored in object storage (not counted in DB)

Bandwidth:
- Incoming: 70K writes/sec × 200 bytes = ~14 MB/s (text only)
- Outgoing: 700K reads/sec × 200 bytes = ~140 MB/s (text only)
- With media: 70K × 0.1 × 50 KB = ~350 MB/s (media metadata bursts)

Architecture Implications:
- 700K reads/sec requires sharded database + caching (Redis cluster)
- 1.1 PB storage requires horizontal sharding by user or conversation
- WebSocket connections: 50M DAU × 30% concurrent = 15M connections
  → needs ~300 servers at 50K connections each
- Message fan-out: group chats multiply writes; need message queue

Profiling and Measurement Tools

Before optimizing, you must measure. Here are the essential tools by domain:

DomainToolWhat It Measures
CPU profilingperf (Linux), Instruments (macOS)Function-level CPU time, call stacks
Flame graphsBrendan Gregg's flamegraph.plVisual CPU profiling — spot hotspots instantly
Go profilingpprofCPU, memory, goroutine, mutex contention
Node.js profilingclinic.js, Chrome DevToolsFlame charts, event loop delay, GC pauses
JVM profilingasync-profiler, JFRCPU, allocation, lock contention
DatabaseEXPLAIN ANALYZE, pg_stat_statementsQuery plans, actual execution time
Networktcpdump, Wireshark, mtrPacket-level analysis, path latency
FrontendLighthouse, WebPageTest, Chrome DevTools PerformanceLCP, FID, CLS, resource loading
APMDatadog, New Relic, Grafana TempoEnd-to-end distributed tracing
Load testingk6, Gatling, LocustThroughput, latency under load

TIP

When profiling, always compare against a baseline. A p99 latency of 50ms means nothing without context. Is it better or worse than last week? Than before the deployment? Collect benchmarks continuously and alert on regressions, not absolute thresholds.

Common Performance Traps

TrapWhy It's SlowFix
N+1 queries100 records = 101 DB callsUse JOINs or batch loading
Unindexed queriesFull table scan on millions of rowsAdd appropriate indexes
Connection per request10ms per new connectionUse connection pooling
Synchronous I/OThread blocked waiting for networkUse async I/O
Large payloadsSerializing 10MB JSON responsePagination, compression, streaming
DNS lookup per request20-100ms added latencyDNS caching, connection keep-alive
TLS handshake per request2-5ms per new connectionConnection reuse, HTTP/2, TLS session resumption
Cross-region calls100ms+ per call in hot pathData locality, caching, read replicas
Lock contentionThreads waiting for mutexLock-free structures, sharding, MVCC

DANGER

The most common performance mistake is optimizing before measuring. Always profile first. The bottleneck is almost never where you think it is. Use tools like perf, flamegraphs, pprof, or your APM tool to identify actual hotspots before writing any optimization code.

"What I cannot create, I do not understand." — Richard Feynman