Practice Questions: Hard

These 10 problems are interview-level hard. They require deep understanding of distributed systems, real-time processing, consistency models, and complex trade-offs. Expect these at senior/staff engineer interviews. Each problem has multiple valid approaches, and the "right" answer depends on which trade-offs you prioritize.

Problem 1: Design Google Docs (Collaborative Editing)

Core challenge: Multiple users editing the same document simultaneously, seeing each other's changes in real-time, with no conflicts.

Requirements

Functional:
- Real-time collaborative text editing
- Multiple cursors visible
- Undo/redo per user
- Version history
- Comments and suggestions

Non-Functional:
- <50ms propagation of edits between users
- Handle 50+ simultaneous editors per document
- No data loss (every keystroke persists)
- Offline editing with sync on reconnect

Key Technical Challenge: Conflict Resolution

Two users type at position 10 simultaneously. Without conflict resolution, their edits overwrite each other.

Two approaches:

Approach	How It Works	Pros	Cons
OT (Operational Transformation)	Transform operations against concurrent operations	Proven (Google Docs uses this)	Complex to implement correctly, centralized server needed
CRDT (Conflict-Free Replicated Data Types)	Data structure guarantees conflict-free merging	Decentralized, offline-friendly	Higher memory overhead, complex data structures

Data Model

Document: { doc_id, title, owner_id, created_at, updated_at }
Operations: { op_id, doc_id, user_id, operation_type, position, content, revision, timestamp }
Snapshots: { doc_id, revision, content, timestamp }  -- Periodic full snapshots

Key Trade-offs

OT vs CRDT: OT is simpler to reason about but requires a central server. CRDT works offline but has higher overhead. Google uses OT; Figma uses CRDT.
Snapshot frequency: Every N operations, save a full snapshot for fast document loading
WebSocket vs SSE: Need bidirectional (edits both directions), so WebSocket
Undo/redo: Per-user undo stack (undo MY changes, not all changes)

Problem 2: Design Zoom (Video Conferencing)

Core challenge: Low-latency, real-time video/audio streaming between multiple participants.

Requirements

Functional:
- 1-on-1 and group video calls (up to 1000 participants)
- Screen sharing
- Chat during calls
- Recording

Non-Functional:
- Video latency: <150ms
- Audio latency: <100ms
- Adaptive quality based on bandwidth
- Handle network jitter and packet loss

Architecture Approaches

Approach	Topology	Best For	Scaling
P2P (Peer-to-Peer)	Direct between participants	1-on-1 calls	Cannot scale beyond 4-6 participants
SFU (Selective Forwarding Unit)	Server forwards streams without processing	Small-medium meetings	Each participant uploads 1 stream, downloads N-1
MCU (Multipoint Control Unit)	Server mixes all streams into one	Large meetings	Each participant uploads/downloads 1 stream

Key Trade-offs

SFU vs MCU: SFU saves server CPU but requires more client bandwidth. MCU saves bandwidth but needs massive server-side processing.
Protocol: WebRTC for real-time media (UDP-based, handles NAT traversal, adaptive bitrate)
Simulcast: Sender encodes at multiple resolutions; SFU sends appropriate resolution to each receiver based on their bandwidth and screen size
Large meetings (100+): Switch to webinar mode — only active speakers are SFU'd, others receive a mixed stream

Problem 3: Design a Stock Exchange

Core challenge: Match buy and sell orders in real-time with strict ordering, fairness, and audit requirements.

Requirements

Functional:
- Place buy/sell orders (market, limit, stop)
- Match orders based on price-time priority
- Real-time order book (bid/ask depth)
- Trade confirmation and settlement

Non-Functional:
- Matching latency: <1ms (single-digit microseconds for HFT)
- Throughput: 100K+ orders/second
- Zero message loss, strict ordering
- Exactly-once execution (no double fills)

Core: The Matching Engine

python

import heapq
from dataclasses import dataclass, field
from enum import Enum
from typing import Optional


class Side(Enum):
    BUY = "BUY"
    SELL = "SELL"


@dataclass(order=True)
class Order:
    # For buy orders: negate price so highest price has priority (min-heap)
    # For both: earlier timestamp has priority
    sort_key: tuple = field(init=False, repr=False)
    order_id: str = field(compare=False)
    side: Side = field(compare=False)
    price: float = field(compare=False)
    quantity: int = field(compare=False)
    timestamp: int = field(compare=False)

    def __post_init__(self):
        if self.side == Side.BUY:
            self.sort_key = (-self.price, self.timestamp)  # Highest price first
        else:
            self.sort_key = (self.price, self.timestamp)   # Lowest price first


class OrderBook:
    """Price-time priority matching engine."""

    def __init__(self, symbol: str):
        self.symbol = symbol
        self.buy_orders: list[Order] = []   # Max-heap (negated price)
        self.sell_orders: list[Order] = []  # Min-heap

    def add_order(self, order: Order) -> list[dict]:
        """Add order and return list of trades."""
        trades = []

        if order.side == Side.BUY:
            while (order.quantity > 0 and self.sell_orders and
                   self.sell_orders[0].price <= order.price):
                trades.extend(self._match(order, self.sell_orders))

            if order.quantity > 0:
                heapq.heappush(self.buy_orders, order)
        else:
            while (order.quantity > 0 and self.buy_orders and
                   -self.buy_orders[0].sort_key[0] >= order.price):
                trades.extend(self._match(order, self.buy_orders))

            if order.quantity > 0:
                heapq.heappush(self.sell_orders, order)

        return trades

    def _match(self, incoming: Order, book: list) -> list[dict]:
        resting = book[0]
        fill_qty = min(incoming.quantity, resting.quantity)
        trade = {
            "symbol": self.symbol,
            "price": resting.price,
            "quantity": fill_qty,
            "buyer": incoming.order_id if incoming.side == Side.BUY else resting.order_id,
            "seller": resting.order_id if incoming.side == Side.BUY else incoming.order_id,
        }
        incoming.quantity -= fill_qty
        resting.quantity -= fill_qty
        if resting.quantity == 0:
            heapq.heappop(book)
        return [trade]

Key Trade-offs

Single-threaded matching: Most exchanges use a single thread per symbol for ordering guarantees (no locks needed). Partition by symbol for parallelism.
Persistence: Write-ahead log for every order and trade. Replay log to rebuild state after crash.
Market data distribution: Multicast (UDP) for low-latency distribution to market data subscribers.

Problem 4: Design a Distributed Cache

Core challenge: In-memory key-value store distributed across nodes with consistent hashing, replication, and eviction.

Key Components

Key Trade-offs

Consistent hashing with virtual nodes for even distribution (see Consistent Hashing)
Eviction policy: LRU (most common), LFU (frequency-based), ARC (adaptive)
Replication: Async (fast but may lose data) vs sync (slower but durable)
Cache-aside vs look-aside vs cache-as-SoR: Defines who is responsible for populating the cache

Problem 5: Design a Search Engine

Core challenge: Crawl the web, build an index, and serve relevant results in milliseconds.

Components

Crawler: Fetch web pages (see Problem 5 in Easy — Web Crawler)
Indexer: Build inverted index from crawled pages
Ranking: PageRank + text relevance (BM25) + freshness + user signals
Serving: Distribute index across shards, scatter-gather query

Key Trade-offs

Index partitioning: By document (each shard has all terms for subset of docs) vs by term (each shard has all docs for subset of terms). Document-based is more common (better load balancing).
Freshness vs cost: Crawl frequently for fresh results vs crawl less to save compute
Ranking complexity: More signals = better results but slower queries
Snippet generation: Pre-compute at index time vs generate at query time

Problem 6: Design an Ad Serving System

Core challenge: Select and serve the most valuable ad in <100ms, handling billions of requests per day.

Key Components

Ad selection: Given user context (demographics, history, page content), run an auction among eligible ads
Budget management: Track advertiser spend in real-time, stop serving when budget exhausted
Targeting: Match ads to user segments (behavioral, contextual, demographic)
Real-time bidding (RTB): Auction completes in <100ms including external bidder calls

Key Trade-offs

First-price vs second-price auction: Second-price (winner pays second-highest bid) encourages truthful bidding
Budget pacing: Spend budget evenly throughout the day vs front-load
Click prediction accuracy vs latency: More complex models are more accurate but slower

Problem 7: Design a Recommendation Engine

Core challenge: Suggest relevant items from millions of candidates, personalized per user.

Architecture

Key Trade-offs

Collaborative filtering vs content-based: CF works when you have usage data; content-based works for cold start
Online learning vs batch retraining: Online adapts faster but is harder to debug
Diversity vs relevance: Showing only the "best" items creates filter bubbles
Candidate generation: Approximate Nearest Neighbor (ANN) search in embedding space for speed

Problem 8: Design a Payment System

Core challenge: Process payments reliably with exactly-once semantics, fraud detection, and regulatory compliance.

Key Components

Payment API: Accept payment requests with idempotency key
Payment State Machine: INITIATED -> PROCESSING -> AUTHORIZED -> CAPTURED -> SETTLED (or FAILED/REFUNDED)
Payment Gateway Integration: Stripe, Adyen, etc. — handle multiple providers for redundancy
Fraud Detection: ML model scoring each transaction in real-time
Ledger: Double-entry bookkeeping for accounting accuracy

Key Trade-offs

Idempotency: Every payment operation MUST be idempotent (retrying a charge should not double-charge)
Exactly-once processing: Use transactional outbox pattern with idempotency keys
Two-phase payment: Authorize first, capture later (reduces refund rate)
Multi-currency: Store amounts in smallest currency unit (cents), convert at transaction time

Problem 9: Design a CDN

Core challenge: Serve content from edge locations closest to users with minimal latency.

Key Components

DNS routing: GeoDNS or anycast to route users to nearest edge
Edge cache: Cache content at each PoP (Point of Presence)
Origin shield: Intermediate cache between edges and origin to reduce origin load
Cache invalidation: Purge API, TTL-based expiration, versioned URLs

Key Trade-offs

Push vs Pull CDN: Push (pre-populate edges) vs pull (cache on first request). Pull is more common.
Cache key design: URL, URL + headers, URL + query params — affects hit rate
TLS termination: At edge (faster) vs at origin (simpler cert management)
Consistency: After purge, how long until all edges are updated? (eventual, typically <60s)

Problem 10: Design a Gaming Leaderboard

Core challenge: Real-time ranked leaderboard for millions of players with fast rank queries.

Requirements

Functional:
- Update player score
- Get top N players
- Get a specific player's rank
- Get players around a specific rank (neighborhood)

Non-Functional:
- 1M concurrent players, 10K score updates/sec
- Rank query latency: <50ms
- Real-time (rank updates within seconds of score change)

Solution Approaches

Approach	Get Rank	Update Score	Memory	Complexity
Redis Sorted Set (ZSET)	O(log N) ZRANK	O(log N) ZADD	O(N)	Low
Segment Tree	O(log M)	O(log M)	O(M) where M = score range	Medium
Skip List	O(log N)	O(log N)	O(N)	Medium

python

# Redis-based leaderboard — simplest production solution
class Leaderboard:
    def __init__(self, redis_client, name: str):
        self.redis = redis_client
        self.key = f"leaderboard:{name}"

    async def update_score(self, player_id: str, score: float):
        await self.redis.zadd(self.key, {player_id: score})

    async def get_top(self, n: int = 100) -> list:
        return await self.redis.zrevrange(self.key, 0, n - 1, withscores=True)

    async def get_rank(self, player_id: str) -> int:
        rank = await self.redis.zrevrank(self.key, player_id)
        return rank + 1 if rank is not None else None

    async def get_around(self, player_id: str, count: int = 5) -> list:
        rank = await self.redis.zrevrank(self.key, player_id)
        if rank is None:
            return []
        start = max(0, rank - count)
        end = rank + count
        return await self.redis.zrevrange(self.key, start, end, withscores=True)

Key Trade-offs

Redis ZSET: Simple, fast, but limited to ~100M entries per key (memory)
Sharded leaderboard: Partition by score range, merge at query time
Real-time vs periodic: Real-time updates are expensive at scale; consider batching updates every N seconds
Global vs regional: Separate leaderboards per region, global aggregation hourly

Practice Strategy

These problems require 60+ minutes for a thorough design. In an interview, you will only cover the highlights. Practice:

Identify the hardest sub-problem (e.g., conflict resolution for Google Docs, matching engine for stock exchange)
Spend most time on that sub-problem rather than trying to cover everything
Know the trade-offs cold — the interviewer will push you on why you chose one approach over another

Cross-References

System Design Interview Framework — structure your answer
Practice Questions: Medium — build up to these
Deep Dive Topics — what interviewers probe
Discussing Tradeoffs — articulate your choices
CRDT Fundamentals — for Google Docs design
WebRTC — for Zoom design

Hard problems test depth, not breadth. You will not cover the entire system in 45 minutes. The interviewer knows this. What they want to see is that you can identify the hardest part of the problem, go deep on it, and articulate the trade-offs of your approach versus alternatives.

Practice Questions: Hard ​

Problem 1: Design Google Docs (Collaborative Editing) ​

Requirements ​

Key Technical Challenge: Conflict Resolution ​

Data Model ​

Key Trade-offs ​

Problem 2: Design Zoom (Video Conferencing) ​

Requirements ​

Architecture Approaches ​

Key Trade-offs ​

Problem 3: Design a Stock Exchange ​

Requirements ​

Core: The Matching Engine ​

Key Trade-offs ​

Problem 4: Design a Distributed Cache ​

Key Components ​

Key Trade-offs ​

Problem 5: Design a Search Engine ​

Components ​

Key Trade-offs ​

Problem 6: Design an Ad Serving System ​

Key Components ​

Key Trade-offs ​

Problem 7: Design a Recommendation Engine ​

Architecture ​

Key Trade-offs ​

Problem 8: Design a Payment System ​

Key Components ​

Key Trade-offs ​

Problem 9: Design a CDN ​

Key Components ​

Key Trade-offs ​

Problem 10: Design a Gaming Leaderboard ​

Requirements ​

Solution Approaches ​

Key Trade-offs ​

Practice Strategy ​

Cross-References ​

Related Pages

Practice Questions: Hard

Problem 1: Design Google Docs (Collaborative Editing)

Requirements

Key Technical Challenge: Conflict Resolution

Data Model

Key Trade-offs

Problem 2: Design Zoom (Video Conferencing)

Requirements

Architecture Approaches

Key Trade-offs

Problem 3: Design a Stock Exchange

Requirements

Core: The Matching Engine

Key Trade-offs

Problem 4: Design a Distributed Cache

Key Components

Key Trade-offs

Problem 5: Design a Search Engine

Components

Key Trade-offs

Problem 6: Design an Ad Serving System

Key Components

Key Trade-offs

Problem 7: Design a Recommendation Engine

Architecture

Key Trade-offs

Problem 8: Design a Payment System

Key Components

Key Trade-offs

Problem 9: Design a CDN

Key Components

Key Trade-offs

Problem 10: Design a Gaming Leaderboard

Requirements

Solution Approaches

Key Trade-offs

Practice Strategy

Cross-References