Hash Tables
Hash tables are the single most important data structure in practical programming. They power dictionaries, sets, caches, database indexes, routers, and virtually every system that needs fast key-value access. The average
How Hash Tables Work
A hash table maps keys to values through three components:
- Hash function: Converts a key into an integer (the hash code)
- Array: Stores key-value pairs at positions determined by the hash
- Collision resolution: Handles the inevitable case where two keys hash to the same position
'alice' and 'carol' hash to the same index — a collision.
Hash Functions
A good hash function satisfies three properties:
- Deterministic: Same input always produces same output
- Uniform distribution: Outputs are spread evenly across the range
- Efficiency: Fast to compute
Hash Function Examples
TypeScript:
// Simple string hash (djb2 algorithm)
function hash(key: string, tableSize: number): number {
let h = 5381;
for (let i = 0; i < key.length; i++) {
h = ((h << 5) + h + key.charCodeAt(i)) & 0xffffffff;
}
return Math.abs(h) % tableSize;
}
// Integer hash (multiply-shift)
function hashInt(key: number, tableSize: number): number {
// Knuth's multiplicative hash
const A = 2654435769; // 2^32 * (sqrt(5) - 1) / 2
return ((key * A) >>> 0) % tableSize;
}Python:
def hash_string(key: str, table_size: int) -> int:
"""djb2 hash function."""
h = 5381
for char in key:
h = ((h << 5) + h + ord(char)) & 0xFFFFFFFF
return h % table_size
# Python's built-in hash() is already excellent:
# hash("hello") → deterministic integer
# Use hash(key) % table_size for indexingWARNING
Never use a hash function as a security mechanism. Cryptographic hash functions (SHA-256, bcrypt) are designed for security. Hash table hash functions are designed for speed and distribution — they are not collision-resistant against an adversary.
Hash Function Quality
A poor hash function causes clustering — many keys landing in the same bucket. This degrades h(key) = len(key) % table_size — all strings of the same length collide.
Collision Resolution
Method 1: Separate Chaining
Each bucket stores a linked list (or another collection) of entries that hash to that index.
TypeScript:
class HashTableChaining<V> {
private buckets: [string, V][][];
private count: number = 0;
private capacity: number;
constructor(capacity = 16) {
this.capacity = capacity;
this.buckets = Array.from({ length: capacity }, () => []);
}
private hash(key: string): number {
let h = 5381;
for (let i = 0; i < key.length; i++) {
h = ((h << 5) + h + key.charCodeAt(i)) & 0xffffffff;
}
return Math.abs(h) % this.capacity;
}
set(key: string, value: V): void {
const idx = this.hash(key);
const bucket = this.buckets[idx];
for (const entry of bucket) {
if (entry[0] === key) {
entry[1] = value;
return;
}
}
bucket.push([key, value]);
this.count++;
if (this.count / this.capacity > 0.75) {
this.resize();
}
}
get(key: string): V | undefined {
const idx = this.hash(key);
for (const [k, v] of this.buckets[idx]) {
if (k === key) return v;
}
return undefined;
}
delete(key: string): boolean {
const idx = this.hash(key);
const bucket = this.buckets[idx];
for (let i = 0; i < bucket.length; i++) {
if (bucket[i][0] === key) {
bucket.splice(i, 1);
this.count--;
return true;
}
}
return false;
}
private resize(): void {
const oldBuckets = this.buckets;
this.capacity *= 2;
this.buckets = Array.from({ length: this.capacity }, () => []);
this.count = 0;
for (const bucket of oldBuckets) {
for (const [key, value] of bucket) {
this.set(key, value);
}
}
}
}Python:
class HashTableChaining:
def __init__(self, capacity=16):
self.capacity = capacity
self.buckets: list[list[tuple[str, any]]] = [[] for _ in range(capacity)]
self.count = 0
def _hash(self, key: str) -> int:
return hash(key) % self.capacity
def set(self, key: str, value) -> None:
idx = self._hash(key)
bucket = self.buckets[idx]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket[i] = (key, value)
return
bucket.append((key, value))
self.count += 1
if self.count / self.capacity > 0.75:
self._resize()
def get(self, key: str):
idx = self._hash(key)
for k, v in self.buckets[idx]:
if k == key:
return v
return None
def delete(self, key: str) -> bool:
idx = self._hash(key)
bucket = self.buckets[idx]
for i, (k, v) in enumerate(bucket):
if k == key:
bucket.pop(i)
self.count -= 1
return True
return False
def _resize(self) -> None:
old_buckets = self.buckets
self.capacity *= 2
self.buckets = [[] for _ in range(self.capacity)]
self.count = 0
for bucket in old_buckets:
for key, value in bucket:
self.set(key, value)Method 2: Open Addressing
All entries are stored in the array itself. When a collision occurs, probe for the next available slot.
Linear Probing: Check index
Quadratic Probing: Check
Double Hashing: Use a second hash function for the probe step:
TypeScript (Linear Probing):
class HashTableLinearProbe<V> {
private keys: (string | null)[];
private values: (V | null)[];
private capacity: number;
private count: number = 0;
private readonly DELETED = "__DELETED__";
constructor(capacity = 16) {
this.capacity = capacity;
this.keys = new Array(capacity).fill(null);
this.values = new Array(capacity).fill(null);
}
private hash(key: string): number {
let h = 5381;
for (let i = 0; i < key.length; i++) {
h = ((h << 5) + h + key.charCodeAt(i)) & 0xffffffff;
}
return Math.abs(h) % this.capacity;
}
set(key: string, value: V): void {
if (this.count / this.capacity > 0.5) this.resize();
let idx = this.hash(key);
while (this.keys[idx] !== null && this.keys[idx] !== this.DELETED) {
if (this.keys[idx] === key) {
this.values[idx] = value;
return;
}
idx = (idx + 1) % this.capacity;
}
this.keys[idx] = key;
this.values[idx] = value;
this.count++;
}
get(key: string): V | undefined {
let idx = this.hash(key);
while (this.keys[idx] !== null) {
if (this.keys[idx] === key) return this.values[idx]!;
idx = (idx + 1) % this.capacity;
}
return undefined;
}
delete(key: string): boolean {
let idx = this.hash(key);
while (this.keys[idx] !== null) {
if (this.keys[idx] === key) {
this.keys[idx] = this.DELETED as any;
this.values[idx] = null;
this.count--;
return true;
}
idx = (idx + 1) % this.capacity;
}
return false;
}
private resize(): void {
const oldKeys = this.keys;
const oldValues = this.values;
this.capacity *= 2;
this.keys = new Array(this.capacity).fill(null);
this.values = new Array(this.capacity).fill(null);
this.count = 0;
for (let i = 0; i < oldKeys.length; i++) {
if (oldKeys[i] !== null && oldKeys[i] !== this.DELETED) {
this.set(oldKeys[i]!, oldValues[i]!);
}
}
}
}Chaining vs Open Addressing
| Criterion | Chaining | Open Addressing |
|---|---|---|
| Load factor tolerance | Works well even above 1.0 | Degrades badly above 0.7 |
| Cache performance | Poor (pointer chasing) | Excellent (contiguous memory) |
| Deletion | Simple (remove from list) | Requires tombstones |
| Memory overhead | Extra pointers per node | None (but needs lower load factor) |
| Implementation | Simpler | More complex |
| Used by | Java HashMap, Go map | Python dict, Rust HashMap |
Load Factor and Resizing
The load factor is
- For chaining: expected chain length is
. Performance degrades linearly. - For open addressing: expected probes per lookup is
. At : 2 probes. At : 10 probes. At : 100 probes.
DANGER
Open addressing with a load factor above 0.7 is a performance disaster. Most implementations resize at 0.5-0.75. Python's dict resizes when the table is 2/3 full.
Resizing Strategy
When the load factor exceeds the threshold:
- Allocate a new table with 2x capacity
- Rehash all existing entries (hash values change because the modulus changes)
- This is
per resize, but happens infrequently enough that amortized insertion is still
Consistent Hashing
Standard hash tables break catastrophically when the table size changes — all entries must be rehashed. In distributed systems, this means adding or removing a server requires moving most data. Consistent hashing solves this by minimizing data movement.
See the full deep dive at Consistent Hashing.
Key insight: Map both keys and servers onto a ring (hash space). Each key is handled by the nearest server clockwise on the ring. Adding/removing a server only affects the keys in its arc.
Common Interview Patterns
Pattern 1: Frequency Counting
Count occurrences of elements. The most basic hash map pattern.
Python:
from collections import Counter
def top_k_frequent(nums: list[int], k: int) -> list[int]:
count = Counter(nums)
return [num for num, _ in count.most_common(k)]
def group_anagrams(strs: list[str]) -> list[list[str]]:
groups: dict[str, list[str]] = {}
for s in strs:
key = "".join(sorted(s))
groups.setdefault(key, []).append(s)
return list(groups.values())Pattern 2: Two Sum (Hash Map Complement Lookup)
For each element, check if its complement exists in the map.
TypeScript:
function twoSum(nums: number[], target: number): [number, number] {
const seen = new Map<number, number>(); // value → index
for (let i = 0; i < nums.length; i++) {
const complement = target - nums[i];
if (seen.has(complement)) {
return [seen.get(complement)!, i];
}
seen.set(nums[i], i);
}
return [-1, -1];
}Python:
def two_sum(nums: list[int], target: int) -> tuple[int, int]:
seen: dict[int, int] = {} # value → index
for i, num in enumerate(nums):
complement = target - num
if complement in seen:
return (seen[complement], i)
seen[num] = i
return (-1, -1)Complexity:
Pattern 3: Subarray with Given Sum (Prefix Sum + Hash Map)
Use a hash map to store prefix sums. See Arrays & Strings: Prefix Sums.
Python:
def subarray_sum(nums: list[int], k: int) -> int:
prefix_count = {0: 1}
current_sum = 0
count = 0
for num in nums:
current_sum += num
count += prefix_count.get(current_sum - k, 0)
prefix_count[current_sum] = prefix_count.get(current_sum, 0) + 1
return countPattern 4: Sliding Window + Hash Map
Track character frequencies in a window. See Arrays & Strings: Sliding Window.
Pattern 5: Hash Set for O(1) Existence Check
Python:
def longest_consecutive(nums: list[int]) -> int:
"""Find the longest consecutive sequence."""
num_set = set(nums)
max_length = 0
for num in num_set:
# Only start counting from the beginning of a sequence
if num - 1 not in num_set:
length = 1
while num + length in num_set:
length += 1
max_length = max(max_length, length)
return max_lengthComplexity:
Pattern 6: Design Hash Map / Hash Set
A common interview question — implement a hash map from scratch. The implementations in the "Collision Resolution" section above cover this fully.
Hash Tables in Production
Language Implementations
| Language | Hash Map | Hash Set | Collision Strategy |
|---|---|---|---|
| Python | dict | set | Open addressing (probing) |
| JavaScript | Map / Object | Set | Hash table (V8 uses transition trees for objects) |
| Java | HashMap | HashSet | Chaining (linked list → red-black tree at 8) |
| Go | map | N/A (use map[K]bool) | Chaining with buckets of 8 |
| C++ | unordered_map | unordered_set | Chaining |
| Rust | HashMap | HashSet | Robin Hood open addressing (SwissTable) |
Java 8+ Optimization
Java's HashMap converts chains to red-black trees when a bucket exceeds 8 entries. This bounds worst-case lookup to
Hash-Based DoS Attacks
An attacker who knows your hash function can craft inputs that all collide, turning
- Randomized hash seeds: Python randomizes
hash()per process (since 3.3) - SipHash: A cryptographically strong hash function used by Rust, Python, and Ruby for strings
- Tree-ification: Java's fallback to red-black trees at bucket size 8
Complexity Summary
| Operation | Average | Worst Case |
|---|---|---|
| Insert | ||
| Lookup | ||
| Delete | ||
| Space | ||
| Resize |
Practice Problems
| Problem | Pattern | Difficulty |
|---|---|---|
| Two Sum | Complement lookup | Easy |
| Valid Anagram | Frequency counting | Easy |
| Contains Duplicate | Hash set existence | Easy |
| Group Anagrams | Sorted key → bucket | Medium |
| Longest Consecutive Sequence | Hash set + smart start | Medium |
| Subarray Sum Equals K | Prefix sum + hash map | Medium |
| Top K Frequent Elements | Frequency + heap/bucket sort | Medium |
| LRU Cache | Hash map + doubly linked list | Medium |
| Encode and Decode TinyURL | Hash map (design) | Medium |
| Design HashMap | From scratch | Easy |
Further Reading
- Consistent Hashing — hash tables in distributed systems
- Arrays & Strings — prefix sum + hash map pattern
- Linked Lists — LRU cache combines hash map with linked list
- Database Indexing — hash indexes vs B-tree indexes
- Bloom Filters — probabilistic hash-based data structure
Try It Yourself
Problem 1: Given nums = [2, 7, 11, 15] and target = 9, find two numbers that add up to the target using a hash map. Return their indices.
Solution
Iterate through the array, storing each number's index in a hash map:
- i=0, num=2: complement = 9-2 = 7. Not in map. map=
- i=1, num=7: complement = 9-7 = 2. Found 2 at index 0. Answer: [0, 1] (nums[0] + nums[1] = 2 + 7 = 9). Time:
, Space: .
Problem 2: Group the anagrams in ["eat", "tea", "tan", "ate", "nat", "bat"].
Solution
Use a sorted string as the hash key:
- "eat" → sorted key "aet" → group: ["eat"]
- "tea" → sorted key "aet" → group: ["eat", "tea"]
- "tan" → sorted key "ant" → group: ["tan"]
- "ate" → sorted key "aet" → group: ["eat", "tea", "ate"]
- "nat" → sorted key "ant" → group: ["tan", "nat"]
- "bat" → sorted key "abt" → group: ["bat"] Answer: [["eat","tea","ate"], ["tan","nat"], ["bat"]]
Problem 3: Find the longest consecutive sequence in [100, 4, 200, 1, 3, 2].
Solution
Put all numbers in a hash set. For each number, only start counting if num-1 is NOT in the set (this is the start of a sequence):
- 100: 99 not in set → sequence: 100 → length 1
- 4: 3 is in set → skip (not a start)
- 200: 199 not in set → sequence: 200 → length 1
- 1: 0 not in set → sequence: 1, 2, 3, 4 → length 4 Answer: 4 (sequence [1, 2, 3, 4]). Time:
.
Problem 4: Design a hash table with separate chaining that handles the insertion of keys "alice", "bob", "carol" into a table of size 4, where h("alice")=2, h("bob")=1, h("carol")=2.
Solution
- Insert "alice": bucket[2] = [("alice", val)]
- Insert "bob": bucket[1] = [("bob", val)]
- Insert "carol": collision at bucket[2]! Append to chain: bucket[2] = [("alice", val), ("carol", val)] Final table: bucket[0]=[], bucket[1]=[("bob", val)], bucket[2]=[("alice", val), ("carol", val)], bucket[3]=[]
Problem 5: If a hash table has 12 entries and a capacity of 16, what is its load factor? Should it resize?
Solution
Load factor
- For separate chaining: 0.75 is at the common resize threshold. Most implementations (Java HashMap) resize at 0.75.
- For open addressing: 0.75 is too high; performance degrades significantly above 0.7. It should resize. Answer: Load factor is 0.75. Whether to resize depends on the collision strategy, but it is at or past the threshold for most implementations.
Quick Quiz
1. What is the average time complexity of hash table lookup?
- a)
- b)
- c)
- d)
Answer
c)
2. What happens to hash table performance when the load factor approaches 1.0 with open addressing?
- a) Performance improves
- b) Performance is unchanged
- c) The expected number of probes per lookup grows dramatically
- d) The hash function becomes faster
Answer
c) The expected number of probes per lookup grows dramatically — With open addressing, the expected probes is
3. Why does Java's HashMap convert chains to red-black trees when a bucket exceeds 8 entries?
- a) To save memory
- b) To prevent worst-case
lookup from hash-flooding DoS attacks - c) To make insertion faster
- d) To maintain insertion order
Answer
b) To prevent worst-case
4. What is the primary advantage of open addressing over separate chaining?
- a) Simpler deletion
- b) Better cache performance due to contiguous memory
- c) Higher load factor tolerance
- d) No need for a hash function
Answer
b) Better cache performance due to contiguous memory — Open addressing stores all entries in the array itself, benefiting from CPU cache locality. Chaining uses linked lists that scatter entries across the heap, causing cache misses.
5. In the "Subarray Sum Equals K" pattern, what is stored in the hash map?
- a) Element values and their indices
- b) Prefix sum values and their occurrence counts
- c) Subarray start and end indices
- d) Element frequencies
Answer
b) Prefix sum values and their occurrence counts — The hash map stores how many times each prefix sum has occurred. When current_sum - k exists in the map, each occurrence represents a subarray that sums to