Skip to content
Backend Engineer0%
System Design Interview0%

Databases

Databases are the most critical infrastructure component in any application. They're also the most misunderstood. Most developers interact with databases through ORMs and never learn how they actually work — until something breaks at scale and they have no mental model to debug it.

This section doesn't teach you SQL syntax. It teaches you how databases work internally — from the physical storage of bytes on disk through the query planner's cost-based optimization to the replication protocols that keep your data safe across continents.

Why Internals Matter

Understanding database internals gives you:

  • Predictable performance: You'll know why adding an index helps this query but hurts that one
  • Better schema design: You'll understand the physical implications of your logical model
  • Debugging superpowers: When a query is slow, you'll know whether it's the planner, the storage engine, the buffer pool, or the network
  • Informed technology choices: You'll pick the right database for the job instead of defaulting to Postgres for everything

The Storage Spectrum

Learning Path

OrderTopicWhat You'll Learn
1Storage EnginesHow B-trees and LSM trees physically store data on disk
2Write-Ahead LoggingHow databases survive crashes without losing data
3MVCCHow readers and writers operate without blocking each other
4Isolation LevelsThe four ANSI isolation levels, their anomalies, and real-world behavior
5Indexing Deep DiveB+tree, hash, GIN, GiST, BRIN — when each index type is optimal
6Query Planning & OptimizationHow the query planner decides which plan to execute and how to read EXPLAIN output
7Connection PoolingWhy connections are expensive and how pooling works (PgBouncer, connection limits)
8ReplicationSynchronous, asynchronous, semi-synchronous, and chain replication
9ShardingHorizontal partitioning strategies, resharding, and cross-shard queries
10PostgreSQL InternalsDeep dive into Postgres architecture — processes, shared buffers, VACUUM, WAL
11Redis InternalsSingle-threaded architecture, data structures, persistence, clustering
12MongoDB InternalsWiredTiger, document model, replica sets, sharded clusters
13Database Selection GuideDecision framework for choosing the right database for any use case
14Time-Series DatabasesTimescaleDB, InfluxDB, and the unique challenges of time-series data
15Graph DatabasesNeo4j, property graphs, Cypher, when graph models outperform relational
16NewSQLCockroachDB, TiDB, Spanner — distributed SQL with strong consistency
17DynamoDB InternalsSingle-table design, GSIs, partition key strategy, DynamoDB Streams
18Cassandra InternalsLSM tree storage, wide-column model, ring topology, tunable consistency
19Elasticsearch InternalsInverted index, sharding, relevance scoring, aggregations pipeline
20ClickHouse InternalsColumnar storage, MergeTree engine, real-time analytics at petabyte scale
21SQLite InternalsB-tree internals, WAL mode, the most deployed database in existence
Schema Design
22Schema Design: E-CommerceProducts, orders, inventory, carts — modeling a real commerce system
23Schema Design: Social NetworkUsers, posts, follows, feeds — handling graph-like data in SQL
24Schema Design: SaaSMulti-tenancy strategies, row-level security, workspace isolation
25Schema Design: ChatMessages, threads, reactions — high-write, read-heavy messaging schemas
Operations
26Database MigrationsZero-downtime schema changes, expand-contract, backward compatibility
27PostgreSQL DBAVacuuming, autovacuum, bloat, pg_stat_*, connection limits, pg_hba.conf
28PostgreSQL Performance Tuningwork_mem, shared_buffers, checkpoint_completion_target — the knobs that matter
29Multi-Region DatabasesGlobal distribution, active-active, CockroachDB vs Spanner vs Aurora Global
30Serverless DatabasesPlanetScale, Neon, Aurora Serverless v2 — cold starts, scaling to zero, pricing

Key Insight

Every database is making a trade-off. The art is understanding which trade-off is right for your workload:

If your workload is...Optimize for...Consider...
Heavy writes, append-onlyWrite throughputLSM-tree databases (Cassandra, RocksDB)
Mixed read/write, OLTPBalanced latencyB-tree databases (PostgreSQL, MySQL)
Analytics, aggregationsScan speedColumnar databases (ClickHouse, DuckDB)
Key-value lookupsLatencyRedis, DynamoDB, Memcached
Highly connected dataTraversal speedNeo4j, DGraph
Time-ordered eventsTime-range queriesTimescaleDB, InfluxDB
Global distributionConsistency + availabilityCockroachDB, Spanner

"What I cannot create, I do not understand." — Richard Feynman