Discord's Trillion Message Migration

Discord is one of the largest real-time communication platforms in the world, with over 150 million monthly active users sending billions of messages per day. Storing and serving these messages at scale has been one of Discord's defining engineering challenges, requiring two major database migrations over the platform's lifetime.

The journey went: MongoDB (2015–2017) to Cassandra (2017–2022) to ScyllaDB (2022–present). Each migration was driven by hitting fundamental limitations of the current database at Discord's scale, and each migration was performed with zero planned downtime — because Discord's users expect their messages to be available at all times.

Chapter 1: MongoDB (2015–2017)

The Early Architecture

When Discord launched in 2015, they chose MongoDB as their message store. MongoDB was a popular choice for startups — schemaless, easy to get started with, and performant at small scale. Discord stored messages in MongoDB collections, sharded by channel ID.

What Went Wrong

As Discord grew, MongoDB's limitations became apparent:

Memory-mapped storage engine: MongoDB's original MMAPv1 storage engine mapped the entire dataset into memory. As data grew beyond available RAM, performance degraded dramatically
Document-level locking: Under high write concurrency, contention on popular channels caused latency spikes
Compaction issues: The WiredTiger storage engine (introduced in MongoDB 3.0) improved things, but compaction could still cause periodic latency spikes that violated Discord's strict latency requirements

By 2017, Discord was storing billions of messages, and MongoDB was struggling. Read latencies were unpredictable, and the operational burden of managing MongoDB at scale was consuming significant engineering time.

The Tipping Point

The core problem was not that MongoDB was a bad database — it was that MongoDB was not designed for Discord's access pattern: extremely high write throughput (millions of messages per day), combined with read patterns that ranged from "fetch the last 50 messages" (fast) to "search through years of message history" (slow), all with strict latency requirements.

The Decision

Discord evaluated several options and chose Apache Cassandra for its:

Linear scalability (add nodes to add capacity)
Tunable consistency (write availability over consistency)
Proven track record at similar scale (Netflix, Apple, Instagram)
Time-series friendly data model (messages are naturally time-ordered)

Chapter 2: Cassandra (2017–2022)

The Migration to Cassandra

Discord designed their Cassandra schema specifically for message storage:

sql

CREATE TABLE messages (
    channel_id bigint,
    bucket int,
    message_id bigint,
    author_id bigint,
    content text,
    -- additional fields
    PRIMARY KEY ((channel_id, bucket), message_id)
) WITH CLUSTERING ORDER BY (message_id DESC);

The key design decisions:

Partition key: (channel_id, bucket) — messages for a channel are split into time-based buckets to prevent unbounded partition growth
Clustering key: message_id (which is a Snowflake ID containing a timestamp) — messages within a bucket are ordered by time
Bucket strategy: Each bucket represents a fixed time window, ensuring partitions stay at a manageable size

The Migration Process

Discord migrated from MongoDB to Cassandra with zero planned downtime using a dual-write approach:

Dual write: Write every message to both MongoDB and Cassandra, but read from MongoDB
Historical migration: Background job copies all historical messages from MongoDB to Cassandra
Dual read with verification: Read from Cassandra, with a shadow read from MongoDB to verify correctness
Cutover: Once confident in Cassandra's data integrity, switch reads fully to Cassandra
Decommission: Turn off MongoDB

The Hot Partition Problem

Cassandra worked well for most of Discord — but problems emerged with very large, very active channels. The hot partition problem manifested in two ways:

Large servers with busy channels: Discord servers (communities) like gaming guilds or open-source projects could have channels with millions of messages. Even with bucketing, certain partitions became "hot" — receiving a disproportionate share of reads and writes, causing latency spikes on the Cassandra nodes hosting those partitions.

Garbage collection pauses: Cassandra runs on the JVM, and large partitions meant large amounts of heap memory, which triggered garbage collection pauses. These GC pauses caused p99 latency spikes that were visible to users as momentary freezes or delays in message delivery.

Normal channel:
  Partition size: ~10 MB
  Reads per second: 5-10
  Latency: 2-5 ms ✓

Hot channel (large server):
  Partition size: ~200 MB
  Reads per second: 500+
  Latency: 2 ms (p50) ... 800 ms (p99) ✗
  GC pauses: 200-500 ms, every few minutes

The p99 Latency Problem

Discord's users are extraordinarily sensitive to latency. A 200ms pause in a voice channel or a half-second delay in message delivery is immediately noticeable and frustrating. Cassandra's JVM garbage collection pauses were causing exactly this kind of intermittent latency spike, and the problem got worse as data grew.

Compaction Challenges

Cassandra's compaction process — merging SSTables to reclaim space and improve read performance — also caused periodic latency spikes. Compaction is I/O intensive, and when it runs on a node that is also serving hot partitions, the combined load causes tail latency to spike.

Chapter 3: ScyllaDB (2022–Present)

Why ScyllaDB

After evaluating their options, Discord chose ScyllaDB, a C++ rewrite of Cassandra that is API-compatible but fundamentally different under the hood:

Feature	Cassandra	ScyllaDB
Language	Java (JVM)	C++
Threading	Thread pool	Shard-per-core (Seastar)
GC Pauses	Yes (JVM GC)	None (no GC)
Compaction	Can spike latency	Controlled compaction scheduler
API Compatibility	N/A	Wire-compatible with Cassandra

The key advantage: no garbage collection pauses. ScyllaDB's C++ implementation with the Seastar framework gives each CPU core its own shard of data, with no shared memory and no GC. This eliminates the p99 latency spikes that were Discord's primary pain point.

The Migration Architecture

Migrating from Cassandra to ScyllaDB was simpler than the MongoDB migration because ScyllaDB is wire-compatible with Cassandra — the same CQL queries, the same data model, the same drivers.

Discord built a data service middleware that handled the migration transparently. The service was aware of which channels had been migrated and could route requests to the appropriate backend.

Results After Migration

The results were dramatic:

p99 read latency: Dropped from 40–125 ms on Cassandra to 15 ms on ScyllaDB
p99 write latency: Dropped from 5–70 ms to 5 ms
Node count: Reduced from 177 Cassandra nodes to 72 ScyllaDB nodes (59% fewer)
GC pauses: Eliminated entirely
Storage efficiency: Improved due to ScyllaDB's more efficient compaction

What Saved Them

The decision to use a wire-compatible replacement (ScyllaDB) rather than migrating to a completely different database dramatically reduced migration risk. The same schema, same queries, and same drivers meant the application layer required minimal changes. The migration was primarily an infrastructure operation rather than an application rewrite.

Key Architectural Decisions

Snowflake IDs

Discord uses Twitter's Snowflake ID scheme (adapted) for message IDs. Each ID encodes a timestamp, making messages naturally time-ordered without a separate timestamp column. This was critical for efficient range queries ("give me messages after this ID").

Snowflake ID structure:
| 42 bits: timestamp (ms since Discord epoch) | 10 bits: worker | 12 bits: sequence |

Benefit: message_id ORDER BY = chronological ORDER BY
         No secondary index needed for time-based queries

Bucket Strategy

Discord buckets messages within each channel to keep partition sizes bounded. Without bucketing, a channel with years of message history would be a single massive partition — which was exactly the hot partition problem they experienced.

Channel 123456:
  Bucket 1: Messages from day 1-10
  Bucket 2: Messages from day 11-20
  Bucket 3: Messages from day 21-30
  ...

Query "last 50 messages" → Read from most recent bucket only
Query "messages from 2 years ago" → Read from the specific historical bucket

Consistent Hashing for Request Routing

Discord uses consistent hashing to route message requests to specific nodes, ensuring that requests for the same channel consistently hit the same node's cache.

Lessons Learned

1. Database choice evolves with scale

Discord went through three databases in seven years, not because any one choice was wrong at the time, but because their requirements evolved. MongoDB worked at startup scale. Cassandra worked at millions of users. ScyllaDB works at hundreds of millions. The right database is the one that fits your current and near-future scale.

2. JVM garbage collection is a real problem at scale

Watch Out for This

For latency-sensitive systems at scale, JVM garbage collection pauses are not a theoretical concern. Cassandra, Elasticsearch, Kafka — any JVM-based data system will eventually hit GC-related latency spikes as data and traffic grow. Solutions include tuning GC parameters, reducing heap pressure, or moving to non-JVM alternatives.

3. Wire-compatible replacements reduce migration risk

Choosing ScyllaDB (Cassandra-compatible) over a completely different database meant the migration was an infrastructure swap rather than an application rewrite. When planning a migration, prioritize compatibility to minimize the surface area of change.

4. Zero-downtime migration requires a dual-path architecture

Both migrations used a dual-write/dual-read approach that allowed gradual cutover with verification at each step. This pattern — write to both, verify, switch reads, switch writes, decommission old — is the gold standard for database migrations at scale.

5. Partition design is the most important Cassandra decision

Discord's hot partition problem was a consequence of partition key design. In Cassandra and ScyllaDB, the partition key determines data distribution, and an unbalanced distribution means hot nodes, high tail latency, and operational pain. Invest heavily in partition key design before you write your first row.

What You Can Learn

Choose your database for your current scale, not your dream scale. Start with what you know and what works. Migrate when you hit real limitations, not theoretical ones.
Design your partition keys for even distribution. If you use a wide-column store (Cassandra, ScyllaDB, DynamoDB), spend significant time modeling your access patterns and partition key strategy. Use the database selection guide to evaluate options.
Build migration capability into your data layer. A thin data service between your application and your database makes migrations tractable. The service can route requests, dual-write, compare results, and manage cutover — all transparent to the application.
Monitor p99, not just p50. Discord's Cassandra cluster looked healthy at p50 latency. The problems were all at p99 — the tail latency that affected a small but noticeable percentage of requests. Monitor and alert on tail latency percentiles.
Consider non-JVM alternatives for latency-critical systems. If your system has strict tail latency requirements and you are running a JVM-based data store, evaluate alternatives that do not have garbage collection (ScyllaDB, Redis, ClickHouse, etc.).

What Would You Do?

Test your database migration instincts against the decisions Discord's engineers actually faced.

Scenario 1: It is 2017 and your MongoDB cluster is struggling under Discord's load. Read latencies are unpredictable, and operational burden is high. You need a new database. Do you (A) invest in sharding and tuning MongoDB, (B) migrate to Cassandra for its linear scalability and time-series friendly data model, or (C) build a custom database purpose-built for message storage?

What Discord did: They chose (B) — Cassandra. MongoDB's fundamental storage engine limitations (memory-mapped files, document-level locking) made tuning insufficient at Discord's scale. Cassandra's linear scalability, tunable consistency, and proven track record at similar scale (Netflix, Apple) made it the right choice for that era. Building a custom database (C) would have been high-risk and slow. The lesson: choose your database for your current and near-future scale, not your dream scale. Migrate when you hit real limitations, not theoretical ones.

Scenario 2: Your Cassandra cluster is experiencing p99 latency spikes of 800ms caused by JVM garbage collection pauses on hot partitions. Your users notice every spike. Do you (A) tune JVM garbage collection parameters, (B) redesign your partition key strategy to eliminate hot partitions, or (C) migrate to a non-JVM database that eliminates GC entirely?

What Discord did: They chose (C) — migrate to ScyllaDB, a C++ rewrite of Cassandra that is wire-compatible but eliminates garbage collection entirely. GC tuning (A) can reduce pause frequency but cannot eliminate pauses entirely — and as data grows, the problem gets worse. Partition redesign (B) helps but does not solve the fundamental JVM GC issue for a latency-sensitive application. ScyllaDB's shard-per-core architecture with no GC dropped p99 read latency from 40-125ms to 15ms.

Scenario 3: You are planning the migration from Cassandra to ScyllaDB. You have trillions of messages in production. Do you (A) do a big-bang cutover on a maintenance window, (B) use a dual-write approach with gradual cutover and verification, or (C) use ScyllaDB's wire compatibility to do a rolling node replacement within the existing cluster?

What Discord did: They built a data service middleware that handled routing transparently, streaming data from Cassandra to ScyllaDB while routing reads and writes to the appropriate backend based on migration state per channel. The wire compatibility of ScyllaDB dramatically reduced risk — same schema, same queries, same drivers. The migration was primarily an infrastructure operation rather than an application rewrite. The lesson: when possible, choose wire-compatible replacements to minimize the surface area of change.

Key Lessons

Database choice evolves with scale. Discord used three databases in seven years. Each was the right choice at the time. The right database is the one that fits your current and near-future scale.
JVM garbage collection is a real problem at scale. For latency-sensitive systems, GC pauses are not theoretical — they cause visible user-facing latency spikes that worsen as data grows.
Wire-compatible replacements reduce migration risk. Choosing ScyllaDB (Cassandra-compatible) meant the migration was an infrastructure swap, not an application rewrite.
Partition design is the most important wide-column store decision. Unbalanced partition distribution causes hot nodes, high tail latency, and operational pain.
Monitor p99, not just p50. Discord's Cassandra cluster looked healthy at p50. The problems were all at p99 — the tail latency affecting a noticeable percentage of requests.

Quiz

Q1: Why did Discord switch from MongoDB to Cassandra in 2017? MongoDB's MMAPv1 storage engine mapped the entire dataset into memory, causing performance degradation as data exceeded available RAM. Document-level locking caused latency spikes under high write concurrency. Compaction caused periodic latency spikes that violated Discord's strict latency requirements.

Q2: What is the "hot partition problem" that Discord experienced with Cassandra? Very large, active channels (like gaming guilds with millions of messages) caused certain partitions to receive a disproportionate share of reads and writes. These hot partitions triggered JVM garbage collection pauses that caused p99 latency spikes of 200-800ms — directly noticeable to users.

Q3: What specific performance improvement did the ScyllaDB migration achieve? P99 read latency dropped from 40-125ms to 15ms. P99 write latency dropped from 5-70ms to 5ms. Node count was reduced from 177 Cassandra nodes to 72 ScyllaDB nodes (59% fewer). GC pauses were eliminated entirely.

Q4: How do Discord's Snowflake IDs help with message storage and retrieval? Snowflake IDs encode a timestamp in the ID itself, making messages naturally time-ordered without a separate timestamp column. This means ORDER BY message_id is equivalent to chronological ordering, and no secondary index is needed for time-based queries.

Q5: How did Discord perform zero-downtime migrations between databases? They used a dual-write/dual-read approach: write to both old and new databases, migrate historical data in the background, read from the new database with shadow reads from the old for verification, then cut over fully once confident in data integrity. This pattern was used for both the MongoDB-to-Cassandra and Cassandra-to-ScyllaDB migrations.

One-Liner Summary

Discord migrated trillions of messages across three databases in seven years — from MongoDB to Cassandra to ScyllaDB — because each database was right for its era, and JVM garbage collection pauses are the enemy of real-time communication.

Sources: Discord Blog — How Discord Stores Billions of Messages (January 2017); Discord Blog — How Discord Stores Trillions of Messages (March 2023); Discord engineering talks at QCon and other conferences.

Discord's Trillion Message Migration ​

Chapter 1: MongoDB (2015–2017) ​

The Early Architecture ​

What Went Wrong ​

The Decision ​

Chapter 2: Cassandra (2017–2022) ​

The Migration to Cassandra ​

The Migration Process ​

The Hot Partition Problem ​

Compaction Challenges ​

Chapter 3: ScyllaDB (2022–Present) ​

Why ScyllaDB ​

The Migration Architecture ​

Results After Migration ​

Key Architectural Decisions ​

Snowflake IDs ​

Bucket Strategy ​

Consistent Hashing for Request Routing ​

Lessons Learned ​

1. Database choice evolves with scale ​

2. JVM garbage collection is a real problem at scale ​

3. Wire-compatible replacements reduce migration risk ​

4. Zero-downtime migration requires a dual-path architecture ​

5. Partition design is the most important Cassandra decision ​

What You Can Learn ​

What Would You Do? ​

One-Liner Summary ​

Related Pages