Discussing Tradeoffs
The difference between a junior and senior engineer in a system design interview is not knowledge — it is the ability to articulate tradeoffs. A junior says "I would use MongoDB." A senior says "I would use PostgreSQL here because we need ACID transactions for the payment flow, but I would consider DynamoDB for the user activity feed where eventual consistency is acceptable and we need low-latency reads at scale."
This page gives you 20 common tradeoffs with scripts for what to say in each direction.
How to Discuss Tradeoffs
The formula:
1. State the tradeoff clearly
2. Explain Option A's strengths and when it wins
3. Explain Option B's strengths and when it wins
4. Choose one for THIS specific problem and explain WHY
5. Acknowledge what you are giving upNever say: "I would use X because it is better." Always say: "I would use X in this case because [reason specific to this problem]. The tradeoff is [what we lose], but that is acceptable because [why it is okay here]."
1. SQL vs NoSQL
| Aspect | SQL (PostgreSQL, MySQL) | NoSQL (DynamoDB, MongoDB, Cassandra) |
|---|---|---|
| Schema | Rigid, enforced | Flexible, schema-on-read |
| Consistency | Strong (ACID) | Tunable (eventual to strong) |
| Query flexibility | Any query with SQL | Limited (key-based or denormalized) |
| Scale (writes) | Vertical (shard for horizontal) | Horizontal (built-in) |
| Relationships | Joins are natural | Denormalize or application joins |
What to say (choosing SQL): "For this system, I would use PostgreSQL because we have complex relationships between entities — users, orders, products — and we need transactional consistency for the checkout flow. We cannot have a partially committed order. The schema is well-understood and unlikely to change dramatically. If we hit read scaling limits, we can add read replicas before considering sharding."
What to say (choosing NoSQL): "For the activity feed, I would use DynamoDB. The access pattern is simple — get feed for user X, sorted by time. We do not need joins. We need single-digit millisecond reads at massive scale, and DynamoDB gives us that with horizontal scaling built in. Eventual consistency is fine here — a 1-second delay before a like appears in the feed is acceptable."
2. Push vs Pull (Notification/Feed Delivery)
What to say (choosing push/fanout-on-write): "I would use fanout-on-write for the timeline because most users have a moderate number of followers — under 10K. Pre-computing timelines means reads are a single Redis lookup, giving us sub-10ms latency. The tradeoff is write amplification: one tweet creates thousands of cache writes. But with an async queue, this happens in the background without affecting the tweeting user's experience."
What to say (choosing pull/fanout-on-read): "For celebrities, I would switch to fanout-on-read. When someone with 50 million followers tweets, writing to 50 million caches is prohibitively expensive. Instead, when a follower loads their timeline, we merge the pre-computed feed with real-time lookups of celebrity tweets. The tradeoff is higher read latency — maybe an extra 20ms — but that is much better than a 10-minute fanout delay."
3. Consistency vs Availability (CAP)
What to say (choosing consistency): "For the payment system, I would choose consistency over availability. If there is a network partition, I would rather return an error than process a payment twice or lose a transaction. We can show users a 'please try again' message, which is far better than charging them twice. I would use a CP database like CockroachDB or PostgreSQL with synchronous replication."
What to say (choosing availability): "For the product catalog, I would choose availability. Users should always be able to browse products, even during partial outages. If a price update has not propagated to all replicas yet, the worst case is someone sees a slightly stale price — that is a minor inconvenience, not a correctness violation. I would use DynamoDB with eventually consistent reads."
4. Monolith vs Microservices
What to say (choosing monolith): "Given that we have a team of 8 engineers, I would start with a modular monolith. Microservices would triple our operational complexity — we would need service discovery, distributed tracing, a deployment pipeline per service — and with 8 people, we would spend more time on infrastructure than features. We can always extract services later when we have a specific reason, like a component that needs independent scaling."
What to say (choosing microservices): "With 60 engineers across 8 teams, microservices make sense. Each team needs to deploy independently — we cannot have Team A blocked because Team B's feature is not ready. The operational overhead is justified by the organizational benefit. I would start by identifying bounded contexts from the domain model and giving each team ownership of their services."
5. Cache-Aside vs Write-Through
| Aspect | Cache-Aside | Write-Through |
|---|---|---|
| Write path | Write DB, invalidate cache | Write cache, cache writes to DB |
| Read path | Check cache, miss -> read DB, populate cache | Always in cache |
| Consistency | Potentially stale (cache miss window) | Always fresh |
| Write latency | Lower (only DB write) | Higher (cache + DB write) |
| Complexity | Application manages cache | Cache layer manages sync |
What to say (choosing cache-aside): "I would use cache-aside for the product catalog. Most products are read thousands of times for every write. On a cache miss, we read from the database and populate the cache with a TTL. The tradeoff is a potential stale read window — if a product price changes, some users might see the old price until the cache expires. For a product catalog, a 5-minute TTL is acceptable."
What to say (choosing write-through): "For the user session store, I would use write-through caching. Every session write goes through the cache to the database. This ensures the cache is always fresh — users never see stale session data, which could cause authentication issues. The tradeoff is higher write latency (two writes instead of one), but session writes are infrequent compared to reads."
6. REST vs gRPC
What to say (choosing REST): "For the public API, REST is the right choice. Our clients include mobile apps, web frontends, third-party integrations, and possibly curl from the command line. REST over HTTP with JSON works everywhere. The tradeoff is performance — JSON parsing and HTTP/1.1 overhead — but for a public API, interoperability matters more than raw speed."
What to say (choosing gRPC): "For internal service-to-service communication, I would use gRPC. We control both ends, so we do not need browser compatibility. Protobuf gives us strict type safety with code generation, binary serialization is 5-10x smaller than JSON, and HTTP/2 multiplexing reduces connection overhead. The tradeoff is tooling — gRPC is harder to debug with curl — but for internal services, type safety and performance matter more."
7. Kafka vs RabbitMQ
What to say (choosing Kafka): "I would use Kafka for event streaming between our microservices. We need event replay (a new analytics service should be able to process historical events), guaranteed ordering per partition, and multiple independent consumer groups reading the same events. The tradeoff is operational complexity — Kafka requires ZooKeeper or KRaft, topic management, and partition planning — but for event-driven architecture at scale, it is the standard."
What to say (choosing RabbitMQ): "For the task queue — processing image thumbnails, sending emails — I would use RabbitMQ. We need simple job distribution: one producer, multiple competing consumers, dead letter queues for failed messages. We do not need event replay or multiple consumer groups. RabbitMQ is simpler to operate and has better support for complex routing patterns with exchanges and bindings."
8. Synchronous vs Asynchronous Processing
What to say (choosing sync): "User authentication must be synchronous. The user is waiting for a login response. We cannot return '202 Accepted, we will authenticate you later.' The read path for displaying a user's profile is also synchronous — they need the data now."
What to say (choosing async): "After an order is placed, sending the confirmation email, updating analytics, and generating the invoice can all be async. The user gets an immediate 'Order confirmed' response, and the background processing happens over the next few seconds. If the email service is temporarily down, the message sits in the queue and gets processed when it recovers."
9. Horizontal vs Vertical Scaling
What to say (choosing vertical first): "I would start by scaling vertically. A single modern server with 64 cores, 256 GB RAM, and NVMe SSDs handles more load than most startups ever reach. The engineering effort of distributed systems — sharding, replication, consistency management — is not justified until we hit the limits of a single machine. And we should optimize our queries and add caching before we shard."
What to say (choosing horizontal): "At our estimated 50,000 QPS, we need horizontal scaling for the application tier. Stateless API servers behind a load balancer let us scale linearly by adding instances. For the database, we will start with read replicas and graduate to sharding when write throughput becomes the bottleneck."
10. Optimistic vs Pessimistic Locking
What to say (choosing optimistic): "For editing user profiles, I would use optimistic locking. Conflicts are rare — two users are unlikely to edit the same profile simultaneously. We add a version number to the record. On update, we check if the version matches. If it changed, we return a conflict and the client retries. The tradeoff is occasional retry overhead, but that is much better than holding database locks for the duration of a user's editing session."
What to say (choosing pessimistic): "For booking airline seats, I would use pessimistic locking — SELECT FOR UPDATE. When a user starts the booking process, we lock the seat row. This prevents double-booking, which would be a serious business problem. The tradeoff is reduced concurrency — other users cannot book the same seat while the lock is held — but that is exactly the behavior we want."
11. Polling vs WebSocket vs SSE
What to say (choosing polling): "For checking order delivery status, simple polling every 30 seconds is sufficient. Status changes are infrequent — maybe 5 times over a delivery's lifetime. The implementation is trivial, works through any proxy, and the server overhead is minimal. WebSocket would be over-engineering for a use case where updates happen every few hours."
What to say (choosing WebSocket): "For the chat feature, WebSocket is necessary. Users expect sub-second message delivery, and communication is bidirectional — both sending and receiving messages. Polling every second would create 60x the server load and still have noticeable latency. The tradeoff is connection management complexity, but for real-time bidirectional communication, WebSocket is the right tool."
12. Denormalization vs Normalization
What to say (choosing normalization): "For the core data model — users, products, orders — I would normalize. We have multiple access patterns, we need referential integrity, and update anomalies from denormalization would be a maintenance nightmare. The tradeoff is join performance, but with proper indexing and read replicas, this is manageable."
What to say (choosing denormalization): "For the product search index, I would denormalize. Each search document contains the product name, description, price, brand name, category, and average rating — all in one document. This eliminates joins at query time, which is critical for sub-100ms search latency. The tradeoff is that when a brand name changes, we need to update all its products in the search index, but brand name changes are extremely rare."
13-20: Quick Reference Table
| # | Tradeoff | Choose A When | Choose B When |
|---|---|---|---|
| 13 | Batch vs Stream processing | Historical analysis, daily reports | Real-time dashboards, alerting |
| 14 | Read replicas vs CQRS | Simple read scaling | Complex read models, different schemas |
| 15 | UUID vs Sequential ID | Distributed generation, no coordination | B-tree performance, human readability |
| 16 | Blob in DB vs Object Storage | Small files (<256KB), transactional | Large files, cost-sensitive, CDN delivery |
| 17 | Strong vs Eventual Consistency | Financial, inventory, auth | Social feeds, analytics, caches |
| 18 | Single region vs Multi-region | Cost-sensitive, no global users | Global users, high availability |
| 19 | Custom vs Managed Service | Cost at scale, full control needed | Small team, faster time to market |
| 20 | Partitioning: Hash vs Range | Even distribution, no range queries | Time-series, range scans needed |
Communication Tips
Phrases That Impress
- "The tradeoff here is..."
- "This depends on our specific requirements. If [scenario A], I would choose X because..."
- "We are trading [what we lose] for [what we gain], which is acceptable because..."
- "An alternative approach would be... but I prefer X in this case because..."
- "This would break down at [scale/scenario], at which point we would need to..."
Phrases That Raise Red Flags
- "X is always better than Y"
- "Everyone uses X" (bandwagon, not reasoning)
- "I have not considered alternatives"
- "That does not matter" (everything matters, it is about prioritization)
Cross-References
- System Design Interview Framework — the full 45-minute structure
- Common Mistakes — what not to do
- Deep Dive Topics — going deeper on any of these tradeoffs
- Consistency Patterns — consistency vs availability in depth
- Communication Patterns — REST vs gRPC vs GraphQL
- Microservices vs Monolith — architecture choice in depth
Tradeoff discussions are where you demonstrate seniority. Junior engineers pick technologies. Senior engineers pick tradeoffs. Every decision has a cost — your job is to show you understand both sides and can justify why one cost is acceptable for this specific problem.