Databases
Databases are the most critical infrastructure component in any application. They're also the most misunderstood. Most developers interact with databases through ORMs and never learn how they actually work — until something breaks at scale and they have no mental model to debug it.
This section doesn't teach you SQL syntax. It teaches you how databases work internally — from the physical storage of bytes on disk through the query planner's cost-based optimization to the replication protocols that keep your data safe across continents.
Why Internals Matter
Understanding database internals gives you:
- Predictable performance: You'll know why adding an index helps this query but hurts that one
- Better schema design: You'll understand the physical implications of your logical model
- Debugging superpowers: When a query is slow, you'll know whether it's the planner, the storage engine, the buffer pool, or the network
- Informed technology choices: You'll pick the right database for the job instead of defaulting to Postgres for everything
The Storage Spectrum
Learning Path
| Order | Topic | What You'll Learn |
|---|---|---|
| 1 | Storage Engines | How B-trees and LSM trees physically store data on disk |
| 2 | Write-Ahead Logging | How databases survive crashes without losing data |
| 3 | MVCC | How readers and writers operate without blocking each other |
| 4 | Isolation Levels | The four ANSI isolation levels, their anomalies, and real-world behavior |
| 5 | Indexing Deep Dive | B+tree, hash, GIN, GiST, BRIN — when each index type is optimal |
| 6 | Query Planning & Optimization | How the query planner decides which plan to execute and how to read EXPLAIN output |
| 7 | Connection Pooling | Why connections are expensive and how pooling works (PgBouncer, connection limits) |
| 8 | Replication | Synchronous, asynchronous, semi-synchronous, and chain replication |
| 9 | Sharding | Horizontal partitioning strategies, resharding, and cross-shard queries |
| 10 | PostgreSQL Internals | Deep dive into Postgres architecture — processes, shared buffers, VACUUM, WAL |
| 11 | Redis Internals | Single-threaded architecture, data structures, persistence, clustering |
| 12 | MongoDB Internals | WiredTiger, document model, replica sets, sharded clusters |
| 13 | Database Selection Guide | Decision framework for choosing the right database for any use case |
| 14 | Time-Series Databases | TimescaleDB, InfluxDB, and the unique challenges of time-series data |
| 15 | Graph Databases | Neo4j, property graphs, Cypher, when graph models outperform relational |
| 16 | NewSQL | CockroachDB, TiDB, Spanner — distributed SQL with strong consistency |
| 17 | DynamoDB Internals | Single-table design, GSIs, partition key strategy, DynamoDB Streams |
| 18 | Cassandra Internals | LSM tree storage, wide-column model, ring topology, tunable consistency |
| 19 | Elasticsearch Internals | Inverted index, sharding, relevance scoring, aggregations pipeline |
| 20 | ClickHouse Internals | Columnar storage, MergeTree engine, real-time analytics at petabyte scale |
| 21 | SQLite Internals | B-tree internals, WAL mode, the most deployed database in existence |
| Schema Design | ||
| 22 | Schema Design: E-Commerce | Products, orders, inventory, carts — modeling a real commerce system |
| 23 | Schema Design: Social Network | Users, posts, follows, feeds — handling graph-like data in SQL |
| 24 | Schema Design: SaaS | Multi-tenancy strategies, row-level security, workspace isolation |
| 25 | Schema Design: Chat | Messages, threads, reactions — high-write, read-heavy messaging schemas |
| Operations | ||
| 26 | Database Migrations | Zero-downtime schema changes, expand-contract, backward compatibility |
| 27 | PostgreSQL DBA | Vacuuming, autovacuum, bloat, pg_stat_*, connection limits, pg_hba.conf |
| 28 | PostgreSQL Performance Tuning | work_mem, shared_buffers, checkpoint_completion_target — the knobs that matter |
| 29 | Multi-Region Databases | Global distribution, active-active, CockroachDB vs Spanner vs Aurora Global |
| 30 | Serverless Databases | PlanetScale, Neon, Aurora Serverless v2 — cold starts, scaling to zero, pricing |
Key Insight
Every database is making a trade-off. The art is understanding which trade-off is right for your workload:
| If your workload is... | Optimize for... | Consider... |
|---|---|---|
| Heavy writes, append-only | Write throughput | LSM-tree databases (Cassandra, RocksDB) |
| Mixed read/write, OLTP | Balanced latency | B-tree databases (PostgreSQL, MySQL) |
| Analytics, aggregations | Scan speed | Columnar databases (ClickHouse, DuckDB) |
| Key-value lookups | Latency | Redis, DynamoDB, Memcached |
| Highly connected data | Traversal speed | Neo4j, DGraph |
| Time-ordered events | Time-range queries | TimescaleDB, InfluxDB |
| Global distribution | Consistency + availability | CockroachDB, Spanner |