Skip to content
Unverified — AI-generated content. Help verify this page

Design Typeahead / Autocomplete System

A typeahead system returns ranked suggestions within 100 milliseconds of each keystroke. Users expect instant feedback — any perceptible delay destroys the illusion that the system is "reading their mind." This design covers the end-to-end architecture: trie construction from search logs, distributed serving, real-time trending integration, personalization, and multi-language challenges.

Related: Design Autocomplete (alternate take) covers the same problem with different depth emphasis.


1. Problem Statement & Requirements

Functional Requirements

#Requirement
FR-1Return top-10 suggestions for any typed prefix
FR-2Rank by global popularity (search frequency)
FR-3Support personalized suggestions based on user search history
FR-4Update suggestions in near-real-time for trending queries
FR-5Support multi-language input (Latin, CJK, Arabic)
FR-6Filter offensive, illegal, and banned content
FR-7Support phrase completion (not just single words)

Non-Functional Requirements

#RequirementTarget
NFR-1Latencyp99 < 100 ms
NFR-2Availability99.99%
NFR-3Scale500K peak QPS
NFR-4FreshnessTrending queries reflected within 60 seconds
NFR-5Fault toleranceNo single point of failure
NFR-6Data privacyPersonal search history encrypted at rest

Clarifying Questions

Questions to Ask

  • What platform is this for? (Web search, e-commerce, social media?)
  • How many unique queries exist? (Millions vs. billions)
  • Is personalization required or a stretch goal?
  • Must we handle typos / fuzzy matching?
  • What languages are in scope?
  • Should we surface trending queries differently (e.g., with a flame icon)?

2. Back-of-Envelope Estimation

Traffic

  • 500M DAU
  • 6 searches per user per day
  • Average 7 keystrokes per search (each triggers a request)
Requests/day=500×106×6×7=21×109Average QPS=21×10986,400243,000 QPSPeak QPS243K×2.5607K QPS

Storage

  • 1 billion unique query strings
  • Average query: 25 bytes + 8 bytes frequency + 16 bytes metadata = 49 bytes
Raw query data=109×49B=49 GB
  • Trie overhead (node pointers, top-K lists): ~4x raw data
Trie memory per language49×4=196 GB

Bandwidth

  • Average response: 10 suggestions x 30 bytes = 300 bytes + JSON overhead = ~500 bytes
Outbound bandwidth=607K×500B=303 MB/s2.4 Gbps

Debouncing Reduces Actual QPS

Client-side debouncing (50-100ms) eliminates ~40% of requests. Actual server-side QPS is closer to 360K peak, but always design for the un-debounced worst case.


3. High-Level Design

Data Flow

  1. Client types a character -> debounced request to API gateway
  2. API gateway checks Redis cache for the prefix
  3. Cache hit -> return immediately (handles ~85% of traffic)
  4. Cache miss -> route to trie service, which does O(L) lookup (L = prefix length)
  5. Personalization blend -> merge global suggestions with user history
  6. Content filter -> remove banned/offensive terms
  7. Response -> top-10 suggestions returned in < 100ms

4. API Design

typescript
// GET /v1/suggestions?prefix=how+to&limit=10&lang=en&userId=abc123

interface SuggestionRequest {
  prefix: string;          // The typed prefix (URL-encoded)
  limit?: number;          // Max suggestions (default 10)
  lang?: string;           // ISO 639-1 language
  userId?: string;         // For personalized results
  sessionId?: string;      // For session-based personalization
}

interface SuggestionResponse {
  suggestions: Suggestion[];
  isCached: boolean;
  latencyMs: number;
}

interface Suggestion {
  text: string;            // Full query string
  score: number;           // Normalized relevance (0-100)
  category?: string;       // Optional: "trending", "personal", "recent"
  icon?: string;           // Optional: "flame" for trending
  metadata?: {
    searchCount?: number;  // How many times this was searched
  };
}

Debouncing Contract

The API should document a recommended debounce interval (50-100ms). Clients that send requests faster than 30ms apart should be rate-limited.


5. Data Model

Query Frequency Table

sql
CREATE TABLE query_frequencies (
    query_hash     BIGINT PRIMARY KEY,        -- MurmurHash of normalized query
    query_text     VARCHAR(500) NOT NULL,
    language       CHAR(2) NOT NULL DEFAULT 'en',
    frequency      BIGINT NOT NULL DEFAULT 0,
    last_seen      TIMESTAMP NOT NULL DEFAULT NOW(),
    first_seen     TIMESTAMP NOT NULL DEFAULT NOW(),
    is_banned      BOOLEAN NOT NULL DEFAULT FALSE
);

CREATE INDEX idx_freq_lang ON query_frequencies(language, frequency DESC);

User Search History

sql
CREATE TABLE user_search_history (
    user_id        VARCHAR(64) NOT NULL,
    query_text     VARCHAR(500) NOT NULL,
    searched_at    TIMESTAMP NOT NULL DEFAULT NOW(),
    clicked        BOOLEAN DEFAULT FALSE,
    PRIMARY KEY (user_id, searched_at)
) PARTITION BY RANGE (searched_at);

Trie Node (In-Memory Representation)

typescript
interface TrieNode {
  children: Map<string, TrieNode>;    // char -> child
  isTerminal: boolean;                 // end of valid query
  frequency: number;                   // search count (terminal nodes only)
  topK: Suggestion[];                  // pre-computed top-K at this prefix
}

6. Detailed Design

6.1 Trie Construction (Offline Pipeline)

typescript
class TrieBuilder {
  private readonly TOP_K = 10;

  build(queries: QueryFrequency[]): CompressedTrie {
    const trie = new CompressedTrie(this.TOP_K);

    // Sort by frequency descending for efficient top-K propagation
    queries.sort((a, b) => b.frequency - a.frequency);

    for (const q of queries) {
      if (q.isBanned) continue;
      trie.insert(q.queryText, q.frequency);
    }

    return trie;
  }
}

class CompressedTrie {
  private root: TrieNode;
  private readonly topK: number;

  constructor(topK: number) {
    this.root = this.newNode();
    this.topK = topK;
  }

  insert(query: string, frequency: number): void {
    let node = this.root;
    for (const char of query) {
      if (!node.children.has(char)) {
        node.children.set(char, this.newNode());
      }
      node = node.children.get(char)!;
      this.updateTopK(node, query, frequency);
    }
    node.isTerminal = true;
    node.frequency = frequency;
  }

  search(prefix: string): Suggestion[] {
    let node = this.root;
    for (const char of prefix) {
      if (!node.children.has(char)) return [];
      node = node.children.get(char)!;
    }
    return node.topK;
  }

  private updateTopK(node: TrieNode, text: string, score: number): void {
    const idx = node.topK.findIndex(s => s.text === text);
    if (idx >= 0) {
      node.topK[idx].score = score;
    } else if (node.topK.length < this.topK) {
      node.topK.push({ text, score });
    } else if (score > node.topK[node.topK.length - 1].score) {
      node.topK[node.topK.length - 1] = { text, score };
    }
    node.topK.sort((a, b) => b.score - a.score);
  }

  private newNode(): TrieNode {
    return { children: new Map(), isTerminal: false, frequency: 0, topK: [] };
  }
}

6.2 Serving with Prefix Cache

typescript
class TypeaheadService {
  private trie: CompressedTrie;
  private cache: RedisCluster;
  private readonly CACHE_TTL = 300; // 5 minutes

  async getSuggestions(prefix: string, lang: string, limit: number): Promise<Suggestion[]> {
    const cacheKey = `ta:${lang}:${prefix}`;

    // 1. Check cache
    const cached = await this.cache.get(cacheKey);
    if (cached) return JSON.parse(cached);

    // 2. Query trie
    const results = this.trie.search(prefix).slice(0, limit);

    // 3. Cache the result (with jittered TTL to prevent stampede)
    const jitter = Math.floor(Math.random() * 60);
    await this.cache.setex(cacheKey, this.CACHE_TTL + jitter, JSON.stringify(results));

    return results;
  }
}

Cache Hit Rate Optimization

The top 1,000 single-character and two-character prefixes account for 60-70% of all traffic. Pre-warm these on startup:

typescript
async warmCache(lang: string): Promise<void> {
  const alphabet = 'abcdefghijklmnopqrstuvwxyz0123456789';
  for (const c of alphabet) {
    await this.getSuggestions(c, lang, 10);
    for (const c2 of alphabet) {
      await this.getSuggestions(c + c2, lang, 10);
    }
  }
}

The offline pipeline rebuilds every 15 minutes, but breaking news can't wait.

typescript
class HotPatcher {
  private redis: RedisCluster;
  private readonly ANOMALY_THRESHOLD = 3.0; // Z-score

  async detectAndPatch(query: string, count: number): Promise<void> {
    // 1. Check if this is anomalous
    const baseline = await this.getBaseline(query);
    if (!baseline) {
      if (count < 1000) return; // New query, needs minimum threshold
    } else {
      const zScore = (count - baseline.mean) / (baseline.stddev || 1);
      if (zScore < this.ANOMALY_THRESHOLD) return;
    }

    // 2. Broadcast hot patch to all trie service nodes
    await this.redis.publish('trie:hot-patch', JSON.stringify({
      query,
      frequency: count,
      action: 'upsert',
      timestamp: Date.now(),
    }));

    // 3. Invalidate affected cache prefixes
    for (let i = 1; i <= query.length; i++) {
      const prefix = query.substring(0, i);
      await this.redis.del(`ta:en:${prefix}`);
    }
  }

  private async getBaseline(query: string): Promise<{ mean: number; stddev: number } | null> {
    // Fetch from analytics DB
    return null;
  }
}

6.4 Personalization

typescript
class PersonalizedTypeahead {
  private readonly PERSONAL_WEIGHT = 0.3;
  private readonly GLOBAL_WEIGHT = 0.7;

  async blend(
    globalSuggestions: Suggestion[],
    userId: string | null,
    prefix: string
  ): Promise<Suggestion[]> {
    if (!userId) return globalSuggestions;

    // Fetch recent search history (last 100 queries)
    const history = await this.getRecentHistory(userId, 100);

    // Filter history by prefix
    const personalMatches = history
      .filter(q => q.toLowerCase().startsWith(prefix.toLowerCase()))
      .slice(0, 5)
      .map((q, i) => ({
        text: q,
        score: (100 - i * 10) * this.PERSONAL_WEIGHT,
        category: 'personal' as const,
      }));

    // Re-score global suggestions
    const weighted = globalSuggestions.map(s => ({
      ...s,
      score: s.score * this.GLOBAL_WEIGHT,
    }));

    // Merge, deduplicate, sort
    const merged = new Map<string, Suggestion>();
    for (const s of [...personalMatches, ...weighted]) {
      const existing = merged.get(s.text);
      if (existing) {
        existing.score += s.score;
      } else {
        merged.set(s.text, s);
      }
    }

    return Array.from(merged.values())
      .sort((a, b) => b.score - a.score)
      .slice(0, 10);
  }

  private async getRecentHistory(userId: string, limit: number): Promise<string[]> {
    return [];
  }
}

6.5 Multi-Language Support

LanguageChallengeSolution
EnglishSimple tokenizationStandard trie
ChineseNo word boundaries, pinyin inputCharacter-level trie + pinyin mapping
JapaneseMixed scripts (kanji, hiragana, katakana)Multiple trie paths per query
KoreanJamo decomposition for partial syllablesDecompose syllables into jamo for matching
ArabicRTL, diacriticsNormalize diacritics, handle RTL display
typescript
class MultiLanguageRouter {
  private tries: Map<string, CompressedTrie> = new Map();

  async search(prefix: string, lang: string): Promise<Suggestion[]> {
    const trie = this.tries.get(lang);
    if (!trie) {
      // Fallback to English trie
      return this.tries.get('en')?.search(prefix) ?? [];
    }

    // Language-specific normalization
    const normalized = this.normalize(prefix, lang);
    const results = trie.search(normalized);

    // For CJK: also search romanized version
    if (['zh', 'ja', 'ko'].includes(lang)) {
      const romanized = this.romanize(prefix, lang);
      if (romanized !== normalized) {
        const romanResults = trie.search(romanized);
        return this.mergeResults(results, romanResults);
      }
    }

    return results;
  }

  private normalize(input: string, lang: string): string {
    return input.normalize('NFC').toLowerCase();
  }

  private romanize(input: string, lang: string): string {
    // Convert to pinyin (zh), romaji (ja), or romanized Korean (ko)
    return input;
  }

  private mergeResults(a: Suggestion[], b: Suggestion[]): Suggestion[] {
    const seen = new Set(a.map(s => s.text));
    return [...a, ...b.filter(s => !seen.has(s.text))].slice(0, 10);
  }
}

6.6 Content Filtering

typescript
class SuggestionFilter {
  private bannedExact: Set<string> = new Set();
  private bannedPatterns: RegExp[] = [];

  filter(suggestions: Suggestion[]): Suggestion[] {
    return suggestions.filter(s => {
      const lower = s.text.toLowerCase();
      if (this.bannedExact.has(lower)) return false;
      return !this.bannedPatterns.some(p => p.test(lower));
    });
  }
}

7. Scaling & Bottlenecks

Sharding the Trie

Uneven Shard Distribution

Letter frequency is not uniform. Prefixes starting with "s", "c", "p" are 3-5x more common than "x", "z", "q". Use weighted sharding based on actual query distribution, not alphabetical ranges.

Blue-Green Deployment for Trie Updates

Bottleneck Analysis

BottleneckSymptomSolution
196 GB trie doesn't fit one machineOOMShard by prefix range, 4-6 shards
Hot prefixes ("a", "the", "how")p99 latency spikeReplicate hot shards 3-5x
Trie rebuild takes 30+ minutesStale suggestionsIncremental builds + hot patches
Cache stampede on TTL expiryThundering herdJittered TTLs + probabilistic early refresh
Cross-region latency200ms for distant usersRegional trie replicas + edge caches
Trending query floodOverwhelms hot-patch systemRate-limit hot patches to 100/min

8. Trade-offs & Alternatives

Trie vs. Alternatives

ApproachLatencyMemoryUpdate SpeedComplexity
Trie (in-memory)O(L) ~5msHighSnapshot-basedMedium
Elasticsearch prefix~20-50msLowerNear real-timeLow
Redis sorted sets~10msHighReal-timeLow
B-tree (SQL LIKE)~50-200msLow (on disk)Real-timeLow
Ternary search treeO(L) ~8msLower than trieSnapshot-basedMedium

Which to Choose

  • < 100K QPS: Redis sorted sets or Elasticsearch — simple, good enough
  • 100K-500K QPS: Trie with prefix cache — best latency/throughput ratio
  • > 500K QPS: Sharded trie with blue-green deployment — Twitter/Google scale

Pre-computed Top-K vs. On-the-Fly DFS

StrategyLookup TimeMemoryFreshness
Pre-computed top-K at each nodeO(L)Higher (stores lists)Stale until rebuild
DFS traversal at query timeO(subtree)LowerAlways fresh
Hybrid: pre-computed + hot patchesO(L)MediumNear real-time

9. Interview Tips

Start Simple, Then Scale

  1. Single-server trie with top-K at each node
  2. Add Redis prefix cache for hot prefixes
  3. Shard trie by prefix range for memory scaling
  4. Add offline pipeline (Spark/Flink) for trie building
  5. Add real-time trending layer
  6. Add personalization blending

Common Mistakes

  • Forgetting client-side debouncing (50-100ms)
  • Not pre-computing top-K at trie nodes (causes expensive DFS)
  • Ignoring the trie update/rebuild lifecycle
  • Not discussing content safety / banned terms
  • Overlooking CJK language challenges (no word boundaries)
  • Designing for exact match instead of prefix match

Key Talking Points

  1. Why trie over Elasticsearch? Latency. Trie gives O(L) prefix lookup; ES involves network hops + query parsing (~20-50ms overhead).
  2. How to handle 500K QPS? Prefix caching handles 85%+ of traffic. Remaining hits sharded trie replicas.
  3. How fresh are suggestions? Hybrid: bulk rebuild every 15 min + real-time hot patches for trending via pub/sub.
  4. Memory optimization? Compressed trie (path compression), only store top 50M queries, shard across machines.
  5. Fault tolerance? Blue-green deployments, replicated shards, graceful fallback to stale cache.

Time Allocation (45-minute interview)

PhaseTimeFocus
Requirements & clarifications5 minScale, freshness, personalization
Estimation4 minQPS, trie memory, bandwidth
High-level design8 minArchitecture diagram, data flow
Trie deep dive10 minData structure, top-K, prefix cache
Real-time trending5 minAnomaly detection, hot patching
Scaling8 minSharding, replication, blue-green
Trade-offs5 minTrie vs. alternatives, consistency

Summary

ComponentTechnologyScale
Trie ServiceCustom in-memory compressed trie196 GB, sharded 4-6 ways
Prefix CacheRedis Cluster85%+ hit rate, 5-min TTL
Offline PipelineKafka + Flink + Trie Builder15-min rebuild cycle
Real-Time TrendingRedis counters + anomaly detection< 60s freshness
PersonalizationUser history cache + blending100 recent queries/user
Content FilterBanned terms DB + regex patternsIn-line filtering
ServingBlue-green deployment across regions607K peak QPS

Related: Design Autocomplete | Design Twitter Search | Design a Search Engine

"What I cannot create, I do not understand." — Richard Feynman