Skip to content

Storage Systems

Every system you build sits on storage. The database that holds your user records, the object store that serves your images, the volume that persists your container's state — they all depend on storage subsystems with fundamentally different characteristics. Choosing the wrong storage type is expensive: you pay for performance you do not need, or you discover under load that your storage cannot deliver the IOPS your database demands.

This section builds your understanding of storage from first principles: how data is physically organized, how RAID protects against drive failure, how performance is measured, and when to use block, file, or object storage.


Block vs File vs Object Storage

These are the three fundamental storage paradigms. Everything else — databases, distributed file systems, cloud storage services — is built on top of one of these.

Block Storage

Block storage presents raw storage as fixed-size blocks (typically 512 bytes or 4KB). The operating system sees a block device (like /dev/sda) and layers a filesystem on top. Block storage knows nothing about files — it reads and writes blocks at specific addresses.

Examples: SSD/HDD drives, AWS EBS, GCP Persistent Disk, Azure Managed Disk, iSCSI LUNs, Ceph RBD

Characteristics:

  • Lowest latency (direct block addressing)
  • OS formats with a filesystem (ext4, XFS, NTFS)
  • One-to-one mapping: a block volume is attached to exactly one machine (exceptions: shared block with clustering filesystems)
  • Ideal for databases, boot volumes, applications requiring raw I/O

File Storage

File storage presents a hierarchical namespace of files and directories via a network protocol (NFS, SMB/CIFS). Multiple clients can mount the same filesystem simultaneously.

Examples: NFS servers, AWS EFS, Azure Files, GCP Filestore, GlusterFS, CephFS

Characteristics:

  • Shared access (many clients read/write the same files)
  • POSIX semantics (file locks, permissions, directory traversal)
  • Higher latency than block (protocol overhead + network)
  • Ideal for shared configuration, CMS content, legacy applications expecting a filesystem

Object Storage

Object storage is a flat namespace of objects, each identified by a unique key. Objects are immutable — you replace the entire object, not a byte range within it. Metadata is stored alongside the object.

Examples: AWS S3, GCP Cloud Storage, Azure Blob Storage, MinIO, Ceph RGW

Characteristics:

  • Flat namespace (no directories, though / in keys simulates them)
  • HTTP API access (PUT, GET, DELETE)
  • Massive scale (exabytes, billions of objects)
  • Built-in replication and durability (11 nines on S3)
  • Higher latency per operation (HTTP overhead)
  • Ideal for media assets, backups, data lake storage, static website hosting

Comparison Matrix

AttributeBlock StorageFile StorageObject Storage
Access patternRandom read/write at byte offsetFile open/read/write/closeHTTP PUT/GET by key
ProtocolSCSI, NVMe, iSCSINFS, SMB/CIFSHTTP (S3 API)
Latency<1ms (NVMe), 1-5ms (SSD)1-10ms10-100ms
ThroughputHigh (GB/s with NVMe)Medium (limited by protocol)High (parallel GET)
Shared accessNo (single attach)Yes (multi-mount)Yes (HTTP, stateless)
Max scaleTB per volumePB per filesystemExabytes
MutabilityIn-place overwritesIn-place overwritesReplace entire object
MetadataFilesystem inodeFilesystem attributesCustom key-value pairs
Cost$$$$$$
DurabilityDepends on RAID/replicationDepends on backing store99.999999999% (S3)

RAID Levels

RAID (Redundant Array of Independent Disks) combines multiple physical drives into a logical unit for performance, redundancy, or both. Understanding RAID is essential even in cloud environments — cloud block storage services use RAID internally, and self-managed storage systems require explicit RAID configuration.

RAID 0 — Striping

Data is split across drives with no redundancy. Any drive failure loses all data.

MetricValue
Usable capacity100% (N drives)
Read performanceNx (parallel reads across N drives)
Write performanceNx (parallel writes)
Fault toleranceNone (1 drive failure = total data loss)
Use caseScratch storage, temporary data, caches

RAID 1 — Mirroring

Every block is written to two (or more) drives. Survives any single drive failure.

MetricValue
Usable capacity50% (N/2 drives)
Read performance2x (read from either drive)
Write performance1x (must write both copies)
Fault tolerance1 drive failure
Use caseOS drives, critical small databases

RAID 5 — Striping with Distributed Parity

Data and parity are distributed across all drives. Parity allows reconstructing any single failed drive.

p = parity block. If Drive 2 fails, B2 is reconstructed from B1 and Bp.

MetricValue
Usable capacity(N-1)/N (1 drive for parity)
Read performance(N-1)x
Write performanceSlower (parity calculation on every write)
Fault tolerance1 drive failure
Use caseGeneral purpose, read-heavy workloads

RAID 6 — Double Parity

Like RAID 5 but with two parity blocks. Survives any two simultaneous drive failures.

MetricValue
Usable capacity(N-2)/N
Read performance(N-2)x
Write performanceSlower than RAID 5 (dual parity)
Fault tolerance2 drive failures
Use caseLarge arrays where rebuild time is long enough for a second failure

RAID 10 — Mirrored Stripes

Combines RAID 1 (mirroring) and RAID 0 (striping). Data is striped across mirrored pairs.

MetricValue
Usable capacity50% (N/2)
Read performanceNx (read from any mirror in any pair)
Write performanceN/2 x (write to all mirrors)
Fault tolerance1 drive per mirror pair (up to N/2 drives if failures are distributed)
Use caseDatabases, write-heavy workloads requiring both performance and redundancy

RAID Comparison Summary

RAIDMin DrivesCapacityRead SpeedWrite SpeedFault ToleranceBest For
02100%ExcellentExcellentNoneTemp/cache
1250%GoodFair1 driveBoot/OS
53(N-1)/NGoodFair1 driveRead-heavy
64(N-2)/NGoodPoor2 drivesLarge arrays
10450%ExcellentGood1 per pairDatabases

DANGER

RAID is NOT a backup strategy. RAID protects against drive failure. It does not protect against accidental deletion, corruption, ransomware, or controller failure. Always maintain separate backups regardless of RAID level.


Storage Performance Metrics

IOPS (Input/Output Operations Per Second)

IOPS measures how many read or write operations a storage device can perform per second. For databases with many small random reads (e.g., key-value lookups), IOPS is the bottleneck.

Storage TypeRandom Read IOPSRandom Write IOPS
HDD (7200 RPM)75-15075-150
SATA SSD50,000-100,00030,000-80,000
NVMe SSD100,000-1,000,00050,000-500,000
AWS gp3 EBS3,000 (baseline)3,000 (baseline)
AWS io2 EBSup to 256,000up to 256,000
AWS io2 Block Expressup to 256,000up to 256,000

Throughput (MB/s or GB/s)

Throughput measures data transfer rate. For streaming workloads (video processing, analytics, backups), throughput matters more than IOPS.

Storage TypeSequential ReadSequential Write
HDD (7200 RPM)150-200 MB/s150-200 MB/s
SATA SSD500-560 MB/s400-530 MB/s
NVMe SSD (PCIe 4.0)5,000-7,000 MB/s3,000-5,000 MB/s
AWS gp3 EBS125 MB/s (baseline)125 MB/s (baseline)
AWS io2 EBSup to 4,000 MB/sup to 4,000 MB/s

Latency

Latency is the time between issuing an I/O request and receiving the response. For latency-sensitive applications (trading systems, real-time analytics), this is the critical metric.

Storage TypeRead Latency (p50)Read Latency (p99)
Local NVMe10-50 us100-200 us
Local SATA SSD50-100 us200-500 us
HDD2-8 ms10-20 ms
AWS gp3 EBS1-2 ms3-5 ms
AWS io2 EBS0.5-1 ms1-2 ms
AWS EFS (NFS)1-5 ms10-30 ms
AWS S3 (object)20-100 ms200-500 ms

The IOPS-Throughput-Latency Triangle

These three metrics are interrelated:

Throughput=IOPS×I/O SizeLatency1IOPS (at queue depth 1)

A drive doing 100,000 IOPS at 4KB I/O size delivers: 100,000×4KB=400MB/s throughput.

The same drive doing 100,000 IOPS at 256KB I/O size would need: 100,000×256KB=25,600MB/s — which exceeds the interface bandwidth, so IOPS drops.

TIP

When evaluating storage for a workload, first determine whether you are IOPS-bound (many small random I/Os, typical of databases) or throughput-bound (few large sequential I/Os, typical of analytics and media). This determines which storage tier you need and how much it will cost.


When to Use Which Storage Type

WorkloadStorage TypeWhy
PostgreSQL / MySQLBlock (io2, local NVMe)Low latency, random I/O, single attach
Shared configuration filesFile (NFS, EFS)Multiple pods read the same config
User-uploaded imagesObject (S3, MinIO)HTTP access, cheap at scale, CDN integration
Kafka / event streamingBlock (gp3, local SSD)Sequential writes, high throughput
ML training dataObject (S3) + local cacheBulk reads, distributed access
Container registryObject (S3)Large blobs, HTTP API, replication
Kubernetes PersistentVolumesBlock (CSI driver)Direct mount, per-pod isolation
Log aggregationObject (S3) for archive, Block for hot tierCost-optimize by tier
Video transcodingObject (input/output) + Block (scratch)Large files, parallel processing

Section Map

PageWhat You Will Learn
Distributed File SystemsHDFS, Ceph, MinIO, GlusterFS — architectures, trade-offs, and when to use each

Further Reading

"What I cannot create, I do not understand." — Richard Feynman