Vector Clocks & Lamport Timestamps
Time is the most deceptive concept in distributed systems. On a single machine, you can call Date.now() and trust the result. In a distributed system, that same call on two different machines will give you two different numbers — and neither one is "right." This page covers the entire landscape of logical time: from Lamport's 1978 breakthrough through vector clocks, hybrid logical clocks, dotted version vectors, and interval tree clocks. Every mechanism exists because physical time fails, and understanding exactly how it fails is the foundation of everything that follows.
1. Why Logical Clocks Exist
The Problem of Physical Time
Every computer has a quartz crystal oscillator that vibrates at a nominal frequency (typically 32.768 kHz for real-time clocks or MHz/GHz for CPU clocks). These crystals are imperfect. They drift.
Clock drift is the rate at which a clock diverges from a reference. Commodity hardware drifts at roughly 10–200 parts per million (ppm). That means:
- At 100 ppm drift, a clock gains or loses 8.64 seconds per day
- After one week: ~1 minute of drift
- After one month: ~4.3 minutes of drift
Clock skew is the instantaneous difference between two clocks at a given moment. Even if you synchronize two clocks perfectly at time
NTP (Network Time Protocol) can synchronize clocks to within a few milliseconds on a LAN and tens of milliseconds over the internet. But NTP has fundamental limitations:
- Network jitter — round-trip time varies unpredictably, making precise synchronization impossible
- Step corrections — NTP may jump the clock forward or backward, creating duplicate or missing timestamps
- Slew corrections — NTP may speed up or slow down the clock, making durations unreliable
- Asymmetric paths — NTP assumes symmetric network delay, which is often false
- Leap seconds — the clock may go from
23:59:59to23:59:60or skip23:59:60entirely
Physical Clocks Are Unreliable for Ordering
If machine A timestamps event T=100 and machine B timestamps event T=101, you CANNOT conclude that
What We Actually Need
In distributed systems, we rarely need to know the exact time something happened. What we need is to answer the question:
Did event A happen before event B, or are they concurrent?
This is a question about causal ordering, not wall-clock time. Logical clocks answer this question without relying on physical time at all.
2. First Principles: The Happens-Before Relation
Leslie Lamport defined the happens-before relation (
Definition
The happens-before relation
Process order: If events
and occur on the same process, and comes before in the process's execution, then .Message causality: If event
is the sending of a message and event is the receipt of that same message by another process, then .Transitivity: If
and , then .
Concurrent Events
Two events
Concurrent events have no causal relationship. Neither one could have influenced the other. They happened in separate causal chains, possibly at the "same time" or possibly not — we cannot tell, and it does not matter.
Partial Order
The happens-before relation is a strict partial order on events:
- Irreflexive:
— no event happens before itself - Asymmetric:
- Transitive:
It is a partial order because not all events are comparable — concurrent events are incomparable. This is fundamentally different from physical time, which is a total order on all events.
In this diagram:
(process order on P1) (process order on P2) (message causality)- By transitivity:
, , - Concurrent pairs:
, is false ( ), , , , , ,
3. Lamport Timestamps
Core Mechanics
A Lamport timestamp is a single integer counter assigned to each event. Every process maintains its own counter, and the rules are:
- Internal event: Before executing an event, increment the counter:
- Send event: Before sending a message, increment the counter:
. Attach to the message. - Receive event: On receiving a message with timestamp
, set
The Clock Condition (Proof)
Theorem (Lamport Clock Condition): If
Proof: By structural induction on the derivation of
Base cases:
- Process order: If
and are on the same process and precedes , then 's counter was incremented at least once after 's timestamp was assigned. So . - Message causality: If
is a send and is the corresponding receive, then .
Inductive case (transitivity): If
The Critical Limitation
The converse is NOT true.
Counterexample: In the diagram above, event
This means Lamport timestamps cannot detect concurrent events. If two events have different Lamport timestamps, you cannot distinguish "one caused the other" from "they are concurrent." This is the fundamental limitation that vector clocks solve.
Total Ordering with Process IDs
While Lamport timestamps only provide a partial order, you can construct a total order by breaking ties with process IDs. Define:
This gives a total order consistent with causality (if
TypeScript Implementation
class LamportClock {
private counter: number = 0;
private readonly processId: string;
constructor(processId: string) {
this.processId = processId;
}
/**
* Increment and return timestamp for a local event.
*/
tick(): number {
this.counter += 1;
return this.counter;
}
/**
* Increment and return timestamp for a send event.
* Attach the returned value to the outgoing message.
*/
send(): number {
this.counter += 1;
return this.counter;
}
/**
* Update clock on receiving a message with the sender's timestamp.
* Returns the new local timestamp for the receive event.
*/
receive(senderTimestamp: number): number {
this.counter = Math.max(this.counter, senderTimestamp) + 1;
return this.counter;
}
/**
* Get current timestamp without incrementing (for inspection only).
*/
get value(): number {
return this.counter;
}
/**
* Total order comparison using process ID as tiebreaker.
* Returns negative if this < other, positive if this > other, 0 if equal.
*/
compareTo(
thisTimestamp: number,
otherTimestamp: number,
otherProcessId: string
): number {
if (thisTimestamp !== otherTimestamp) {
return thisTimestamp - otherTimestamp;
}
return this.processId.localeCompare(otherProcessId);
}
get id(): string {
return this.processId;
}
}
// --- Usage ---
const clockA = new LamportClock('process-A');
const clockB = new LamportClock('process-B');
// Process A has a local event
const tsA1 = clockA.tick(); // 1
// Process A sends a message to B
const tsA2 = clockA.send(); // 2
// Process B had a local event before receiving
const tsB1 = clockB.tick(); // 1
// Process B receives A's message
const tsB2 = clockB.receive(tsA2); // max(1, 2) + 1 = 3
// Process B has a local event
const tsB3 = clockB.tick(); // 4
console.log(`A: ${tsA1}, ${tsA2}`); // A: 1, 2
console.log(`B: ${tsB1}, ${tsB2}, ${tsB3}`); // B: 1, 3, 44. Vector Clocks
Why They Exist
Lamport timestamps lose information. When you see
Definition
For a system with
is the count of events at process itself (for ) is the latest known event count at process as known to
Rules
- Internal event at
: Increment own component: - Send event at
: Increment own component: . Attach entire vector to the message. - Receive event at
from : Merge vectors component-wise, then increment own component:
Comparing Vector Timestamps
For two vector timestamps
The last condition — incomparability — is the key capability that Lamport timestamps lack. Two vector timestamps are incomparable (concurrent) when neither dominates the other component-wise, meaning each vector has at least one component strictly greater than the corresponding component in the other.
Sequence Diagram: Detecting Concurrency
Now we can detect concurrency:
- Event
at P1: - Event
at P3: - Is
? No, because in position 0. - Is
? No, because in position 2. - Therefore
— they are concurrent. Exactly right.
Compare this with Lamport timestamps, which would give
Proof of Correctness
Theorem: For any two events
Proof (
By structural induction, identical to the Lamport proof but tracking all components:
Base case — process order: If
Base case — message causality: If
Inductive case: Transitivity of
Proof (
Equivalently, we prove the contrapositive: if
If
Case 1:
Case 2:
This bidirectional equivalence is what makes vector clocks strictly more powerful than Lamport timestamps. They perfectly characterize the happens-before relation.
Space Complexity
The fundamental cost is
This is the primary practical limitation of vector clocks. Solutions include:
- Pruning: Remove entries for processes known to have crashed permanently
- Compression: Use sparse representations (most entries may be zero)
- Plausible clocks: Approximate vector clocks with bounded size at the cost of some false positives in concurrency detection
- Alternative mechanisms: Dotted version vectors, interval tree clocks (covered below)
TypeScript Implementation
type VectorTimestamp = Map<string, number>;
class VectorClock {
private clock: VectorTimestamp;
private readonly processId: string;
constructor(processId: string) {
this.processId = processId;
this.clock = new Map<string, number>();
this.clock.set(processId, 0);
}
/**
* Increment own component for a local event.
* Returns a snapshot of the current vector timestamp.
*/
tick(): VectorTimestamp {
this.clock.set(
this.processId,
(this.clock.get(this.processId) ?? 0) + 1
);
return this.snapshot();
}
/**
* Increment own component and return the vector to attach to a message.
*/
send(): VectorTimestamp {
this.clock.set(
this.processId,
(this.clock.get(this.processId) ?? 0) + 1
);
return this.snapshot();
}
/**
* Merge the received vector with our own, then increment own component.
*/
receive(incoming: VectorTimestamp): VectorTimestamp {
// Component-wise maximum
for (const [pid, ts] of incoming) {
const local = this.clock.get(pid) ?? 0;
this.clock.set(pid, Math.max(local, ts));
}
// Increment own component for the receive event
this.clock.set(
this.processId,
(this.clock.get(this.processId) ?? 0) + 1
);
return this.snapshot();
}
/**
* Return an immutable snapshot of the current vector.
*/
snapshot(): VectorTimestamp {
return new Map(this.clock);
}
/**
* Compare two vector timestamps.
* Returns:
* 'before' if a < b (a happened before b)
* 'after' if a > b (a happened after b)
* 'concurrent' if a || b (a and b are concurrent)
* 'equal' if a == b
*/
static compare(
a: VectorTimestamp,
b: VectorTimestamp
): 'before' | 'after' | 'concurrent' | 'equal' {
const allKeys = new Set([...a.keys(), ...b.keys()]);
let aLessOrEqual = true;
let bLessOrEqual = true;
let equal = true;
for (const key of allKeys) {
const aVal = a.get(key) ?? 0;
const bVal = b.get(key) ?? 0;
if (aVal < bVal) {
aLessOrEqual = true; // a[key] <= b[key], consistent with a <= b
bLessOrEqual = false; // b[key] > a[key], so b is NOT <= a
equal = false;
} else if (aVal > bVal) {
aLessOrEqual = false;
bLessOrEqual = true;
equal = false;
}
// aVal === bVal: no change
}
if (equal) return 'equal';
if (aLessOrEqual) return 'before';
if (bLessOrEqual) return 'after';
return 'concurrent';
}
/**
* Compute the join (least upper bound) of two vector timestamps.
* Useful for merging concurrent state.
*/
static merge(
a: VectorTimestamp,
b: VectorTimestamp
): VectorTimestamp {
const result = new Map(a);
for (const [key, val] of b) {
result.set(key, Math.max(result.get(key) ?? 0, val));
}
return result;
}
get id(): string {
return this.processId;
}
}
// --- Usage demonstrating concurrency detection ---
const vc1 = new VectorClock('P1');
const vc2 = new VectorClock('P2');
const vc3 = new VectorClock('P3');
// P1 does local work
const ts_a = vc1.tick(); // {P1: 1}
// P1 sends to P2
const ts_send1 = vc1.send(); // {P1: 2}
const ts_recv1 = vc2.receive(ts_send1); // {P1: 2, P2: 1}
// P3 does independent work — concurrent with P1 and P2
const ts_g = vc3.tick(); // {P3: 1}
// Can we detect that ts_recv1 and ts_g are concurrent?
const comparison = VectorClock.compare(ts_recv1, ts_g);
console.log(comparison); // 'concurrent' ✓
// P2 sends to P3
const ts_send2 = vc2.send(); // {P1: 2, P2: 2}
const ts_recv2 = vc3.receive(ts_send2); // {P1: 2, P2: 2, P3: 2}
// Now P3 knows about P1's and P2's events
const causal = VectorClock.compare(ts_a, ts_recv2);
console.log(causal); // 'before' — a happened before recv2 ✓5. Edge Cases & Failure Modes
Process Crashes and Restarts
When a process crashes and restarts, it may lose its in-memory clock state. If it restarts with a zero clock, it will assign timestamps that are "in the past" from the perspective of other processes. Solutions:
- Persist clock state to durable storage before acknowledging events
- Assign a new process ID on restart (treating the restarted process as a new participant)
- Recover from peers — request current vector timestamps from known peers and merge
Clock Overflow
With 64-bit integers, a process generating 1 billion events per second would take ~292 years to overflow. In practice this is not a concern, but implementations should handle it:
const MAX_SAFE_COUNTER = Number.MAX_SAFE_INTEGER; // 2^53 - 1 in JS
function safeIncrement(counter: number): number {
if (counter >= MAX_SAFE_COUNTER) {
throw new Error(
`Clock overflow: counter reached ${counter}. ` +
`This process has generated more than 9 quadrillion events.`
);
}
return counter + 1;
}Late-Arriving Messages
A message sent long ago may arrive after many events have occurred. With Lamport timestamps, the receive rule (
Byzantine Processes
Logical clocks assume processes follow the rules honestly. A process that lies about its timestamp (sends a clock far in the future) can cause all receiving processes to jump their clocks forward, permanently inflating timestamps. This is analogous to a Sybil attack on time. Defenses include:
- Bounded clock advancement — reject timestamps more than
ahead of local time - Signed timestamps — use authenticated data structures so forgery is detectable
- Hybrid logical clocks — bound logical clock to physical time, preventing unbounded jumps
Network Partitions and Reconvergence
During a network partition, processes on different sides of the partition independently advance their clocks. When the partition heals:
- Lamport timestamps: The first cross-partition message causes the receiving side to jump forward. No conflict is detected.
- Vector clocks: All events during the partition on different sides are correctly identified as concurrent. This is exactly the information needed for conflict resolution.
Garbage Collection of Vector Entries
In long-running systems where processes come and go, vector clocks grow without bound. Entries for departed processes should eventually be pruned, but only after all nodes have observed all events from those processes. Premature pruning can cause incorrect causal ordering.
6. Performance Characteristics
Message Size Overhead
| Mechanism | Timestamp Size | Per-Message Overhead |
|---|---|---|
| Lamport timestamp | 1 integer (8 bytes) | 8 bytes |
| Vector clock ( | ||
| Hybrid Logical Clock | 2 integers (physical + logical) | 16 bytes |
| Dotted Version Vector | ||
| Interval Tree Clock | Variable (tree structure) | Variable, |
Computation Overhead
| Operation | Lamport | Vector Clock | HLC |
|---|---|---|---|
| Local event | |||
| Send | |||
| Receive | |||
| Compare | |||
| Storage per event |
Practical Scalability Limits
| Number of Nodes | Vector Clock Size | Feasible? |
|---|---|---|
| 3–10 | 24–80 bytes | Absolutely |
| 100 | 800 bytes | Fine for most workloads |
| 1,000 | 8 KB | Borderline — significant overhead per message |
| 10,000 | 80 KB | Impractical for per-message clocks |
| 1,000,000 | 8 MB | Completely infeasible |
This scaling wall is why systems with large or dynamic participant sets use alternatives like dotted version vectors, interval tree clocks, or hybrid logical clocks.
7. Mathematical Foundations
Lattice Theory
Vector clocks form a bounded join-semilattice under the component-wise maximum operation.
Join (least upper bound): For vectors
Properties of the join:
These are exactly the properties required for CRDTs (Conflict-free Replicated Data Types) — which is why vector clocks are the theoretical foundation for CRDT merge operations.
Bottom element: The zero vector
Characterization Theorem
Let
This means vector clocks are the canonical representation of the happens-before relation. No smaller data structure can capture the same information exactly.
Formal statement (Charron-Bost, 1991): For
Information-Theoretic Argument
Each process can independently generate events. To distinguish "event
Thus, the
Partial Orders and Antichains
The set of concurrent events forms an antichain in the partial order induced by
The width of the happens-before partial order is exactly
8. Hybrid Logical Clocks (HLC)
Motivation
Vector clocks have
Hybrid Logical Clocks (HLC), introduced by Kulkarni et al. in 2014, combine physical time with a logical counter to achieve:
size (just two integers)- Timestamps close to physical time (useful for debugging and TTL operations)
- Causal consistency: if
then - No backward jump: HLC is always
physical time
The trade-off: HLC provides the same guarantees as Lamport timestamps (the forward direction), NOT vector clock guarantees. You still cannot detect concurrency from HLC timestamps alone. But HLC is strictly better than Lamport timestamps because the timestamps are meaningful (close to wall-clock time).
Structure
An HLC timestamp consists of two components:
Where:
(logical part): the maximum physical time seen so far (counter): a tiebreaker for events at the same value
Comparison:
Rules
Let
Local event or send at process
:l' = l_j l_j = max(l', pt) if l_j == l': c_j = c_j + 1 else: c_j = 0Attach
to the message if sending.Receive at process
from message with :l' = l_j l_j = max(l', l_m, pt) if l_j == l' == l_m: c_j = max(c_j, c_m) + 1 elif l_j == l': c_j = c_j + 1 elif l_j == l_m: c_j = c_m + 1 else: c_j = 0 // physical time advanced past both
Key Properties
- Monotonicity: HLC timestamps at a process are strictly increasing
- Bounded divergence:
is bounded by the maximum clock skew across all processes. Formally: - Causal consistency:
- Close to physical time: The
component is always within of real physical time
Sequence Diagram: HLC Behavior
Notice how the
TypeScript Implementation
interface HLCTimestamp {
/** Logical component: max physical time seen */
l: number;
/** Counter: tiebreaker within same l value */
c: number;
/** Node identifier for total ordering */
node: string;
}
class HybridLogicalClock {
private l: number;
private c: number;
private readonly node: string;
private readonly getPhysicalTime: () => number;
/**
* @param node - Unique node identifier
* @param getPhysicalTime - Function returning current physical time in ms.
* Defaults to Date.now(). Injected for testability.
*/
constructor(
node: string,
getPhysicalTime: () => number = () => Date.now()
) {
this.node = node;
this.getPhysicalTime = getPhysicalTime;
this.l = getPhysicalTime();
this.c = 0;
}
/**
* Generate a timestamp for a local event or outgoing message.
*/
now(): HLCTimestamp {
const pt = this.getPhysicalTime();
const lOld = this.l;
this.l = Math.max(lOld, pt);
if (this.l === lOld) {
// Physical time has not advanced; increment counter
this.c += 1;
} else {
// Physical time advanced; reset counter
this.c = 0;
}
return { l: this.l, c: this.c, node: this.node };
}
/**
* Update clock upon receiving a message with timestamp (l_m, c_m).
* Returns the timestamp for the receive event.
*/
receive(remote: HLCTimestamp): HLCTimestamp {
const pt = this.getPhysicalTime();
const lOld = this.l;
this.l = Math.max(lOld, remote.l, pt);
if (this.l === lOld && this.l === remote.l) {
// All three are equal; take max counter + 1
this.c = Math.max(this.c, remote.c) + 1;
} else if (this.l === lOld) {
// Our old l was the max; increment our counter
this.c += 1;
} else if (this.l === remote.l) {
// Remote l was the max; continue from remote counter
this.c = remote.c + 1;
} else {
// Physical time was the max; reset counter
this.c = 0;
}
return { l: this.l, c: this.c, node: this.node };
}
/**
* Compare two HLC timestamps.
* Returns negative if a < b, positive if a > b, 0 if equal.
*/
static compare(a: HLCTimestamp, b: HLCTimestamp): number {
if (a.l !== b.l) return a.l - b.l;
if (a.c !== b.c) return a.c - b.c;
return a.node.localeCompare(b.node);
}
/**
* Check if a timestamp is within acceptable bounds of physical time.
* Used to detect Byzantine or severely skewed clocks.
*/
static isWithinBound(
ts: HLCTimestamp,
physicalTime: number,
maxDrift: number
): boolean {
return Math.abs(ts.l - physicalTime) <= maxDrift;
}
}
// --- Usage ---
let physicalTime = 1000;
const hlc1 = new HybridLogicalClock('node-1', () => physicalTime);
const hlc2 = new HybridLogicalClock('node-2', () => physicalTime);
// Two events at the same physical time on node-1
const ts1 = hlc1.now(); // {l: 1000, c: 1, node: 'node-1'}
const ts2 = hlc1.now(); // {l: 1000, c: 2, node: 'node-1'}
// node-1 sends to node-2
const ts3 = hlc1.now(); // {l: 1000, c: 3, node: 'node-1'} — attached to msg
const ts4 = hlc2.receive(ts3); // {l: 1000, c: 4, node: 'node-2'}
// Physical time advances
physicalTime = 1001;
const ts5 = hlc2.now(); // {l: 1001, c: 0, node: 'node-2'} — counter resets
// Ordering is correct
console.log(HybridLogicalClock.compare(ts1, ts2)); // negative (ts1 < ts2)
console.log(HybridLogicalClock.compare(ts3, ts4)); // negative (ts3 < ts4)
console.log(HybridLogicalClock.compare(ts4, ts5)); // negative (ts4 < ts5)CockroachDB's HLC Implementation
CockroachDB uses HLC as its primary timestamp mechanism. Their implementation adds several production hardening features:
Maximum clock offset enforcement: CockroachDB nodes refuse to start if their clock is more than 500ms (configurable) from other nodes. If clock skew exceeds this bound during operation, the node self-terminates to prevent causal ordering violations.
Uncertainty intervals: When a transaction reads a key, it checks for values written within the uncertainty window
. If such a value exists, the transaction restarts at the higher timestamp. This eliminates stale reads without requiring perfectly synchronized clocks.Closed timestamps: CockroachDB tracks a "closed timestamp" per range — a timestamp below which no new writes will occur. This enables consistent reads at historical timestamps without blocking writes.
War Story
CockroachDB discovered in production that AWS instances occasionally experienced clock jumps of several hundred milliseconds when NTP stepped the clock. This caused transactions to see unexpectedly large uncertainty intervals, leading to excessive transaction restarts. Their fix: monitoring for clock jumps and temporarily increasing the uncertainty interval, then gradually reducing it as the clock stabilizes. The lesson: even with HLC, physical clock behavior matters enormously.
9. Dotted Version Vectors
The Problem with Vector Clocks in Key-Value Stores
When vector clocks are used in key-value stores (like DynamoDB or Riak), a subtle problem arises. Consider a key x replicated across nodes A, B, and C. If a client writes to node A, then another client writes to node B concurrently, the vector clock correctly identifies the conflict. But what happens during conflict resolution?
The resolved value gets a new vector clock that reflects BOTH causal histories. Now if a third client reads the resolved value from node A and writes it back, the vector clock grows — it now contains entries for A, B, and the resolving process. Over many rounds of concurrent writes and resolutions, the clock grows without bound even though the number of replicas is fixed.
The root cause: standard vector clocks track client identities, but in a replicated key-value store, what matters is replica identities.
How Dotted Version Vectors Work
A dotted version vector (DVV) separates the "latest event" (the dot) from the "causal context" (the version vector):
Where:
- dot =
: the specific replica and sequence number of the event that created this value - version vector: the standard vector clock tracking known events at each replica
The dot provides a precise pointer to the specific event, while the version vector summarizes the causal past. This separation allows accurate conflict detection while keeping the clock size bounded by the number of replicas (not the number of clients or write operations).
Riak's Implementation
Riak adopted dotted version vectors to solve their vector clock explosion problem. In Riak's approach:
- Each vnode (virtual node) maintains a monotonically increasing counter
- When a client writes to a vnode, the vnode assigns a new dot:
(vnode_id, counter++) - The client's read context (version vector from the most recent read) is attached to the write
- The vnode compares the dot and version vector to determine if this write supersedes, is concurrent with, or conflicts with the current stored values
The key insight: the vector only needs entries for vnodes, not for every client that has ever written. Since the number of vnodes is fixed and small (e.g., 64), the vector size is bounded.
War Story
Before adopting dotted version vectors, Riak users frequently encountered "sibling explosion" — a single key accumulating thousands of concurrent versions because the vector clock kept growing and comparisons became less precise. One production system had a key with over 100,000 siblings, consuming several GB of memory on the node. Dotted version vectors eliminated this class of problem entirely by ensuring that resolved values properly dominate their predecessors in the causal order.
10. Interval Tree Clocks
The Dynamic Participant Problem
Both vector clocks and dotted version vectors require knowing the set of participants in advance (or at least having a fixed-size identifier space). What if processes dynamically join and leave? Adding a new process to a vector clock system requires adding a new dimension to every timestamp everywhere.
Interval Tree Clocks (ITC), introduced by Almeida, Baquero, and Gonçalves in 2008, solve this by:
- Representing process identity as intervals of the real number line
- Forking: a process splits its interval to create a new participant (e.g.,
and ) - Joining: when a process departs, its interval is reclaimed by merging with a neighbor
- No global knowledge of participants is required
Structure
An ITC stamp is a pair
is the ID — a binary tree encoding an interval subset of is the event component — a tree of integers recording the event count
Operations
- Fork: Split
where and partition - Event: Increment the event tree at positions owned by the ID
- Join: Merge
and - Compare: Similar to vector clocks — one event tree dominates another if it is
at all positions
The space complexity is
When to Use Interval Tree Clocks
| Scenario | Best Mechanism |
|---|---|
| Fixed, small set of replicas (3-10) | Vector clocks |
| Key-value store with fixed vnodes | Dotted version vectors |
| Dynamic peer-to-peer network | Interval tree clocks |
| Large-scale system needing only causal consistency | HLC |
| Need total ordering with minimal overhead | Lamport timestamps |
11. Real-World Systems
Amazon DynamoDB
DynamoDB (as described in the 2007 Dynamo paper) uses vector clocks for conflict detection:
- Each write is tagged with a vector clock based on the coordinating replica
- On read, if multiple versions exist with concurrent vector clocks, all versions (called "siblings") are returned to the client
- The client application performs reconciliation (e.g., merging shopping carts by taking the union)
- The reconciled value is written back with a new vector clock that dominates all siblings
Important caveat: The 2007 Dynamo paper described this approach, but modern DynamoDB (the managed AWS service) uses a simpler last-writer-wins (LWW) conflict resolution by default. The vector clock approach was deemed too complex for most application developers. This is a significant pragmatic lesson: theoretical optimality does not always win in production.
War Story
The original Dynamo team at Amazon discovered that application developers consistently failed to implement correct reconciliation functions. Shopping cart union (take the superset of items from both versions) was one of the few cases where the merge function was both obvious and correct. For most other data types, developers either returned one version arbitrarily (reinventing LWW poorly) or wrote merge functions with subtle bugs. This experience directly influenced the design of DynamoDB (the service) to default to LWW, and influenced the development of CRDTs, which provide automatic, mathematically correct merging.
CockroachDB (HLC + MVCC)
CockroachDB combines HLC with multi-version concurrency control:
- Every key-value pair is stored with its HLC timestamp as a version
- Transactions read at a consistent HLC timestamp
- The uncertainty interval
handles clock skew - Serializable isolation is achieved through MVCC + write intents + the closed timestamp mechanism
The advantage of HLC over pure Lamport timestamps here: timestamps are meaningful wall-clock values, enabling features like:
- Time-travel queries:
SELECT * FROM t AS OF SYSTEM TIME '2026-03-15' - TTL: expire data based on its HLC timestamp
- Debugging: timestamps in logs correspond to real time
Google Spanner (TrueTime)
While not a logical clock, Spanner's TrueTime API deserves mention as the alternative to logical clocks. Google built custom hardware (atomic clocks and GPS receivers in every data center) to provide a time API with bounded uncertainty:
TrueTime.now() → [earliest, latest]The interval is guaranteed to contain the true time. Typically
This is fundamentally different from logical clocks — it solves the same ordering problem but through hardware-guaranteed physical time bounds rather than logical reasoning. It works spectacularly well but requires custom hardware that only Google (and now a few cloud providers offering "similar" services) can realistically deploy.
Cassandra (Lamport-like LWW)
Cassandra uses a simple last-write-wins strategy based on client-provided timestamps (by default, the coordinator's wall clock time). This is neither Lamport timestamps nor vector clocks — it is plain physical timestamps with no logical component.
The consequences:
- Clock skew causes data loss: If node A's clock is ahead by 5 seconds, writes to node A will "win" over concurrent writes to node B for the next 5 seconds
- No concurrency detection: All concurrent writes are silently resolved by timestamp, with no way to detect or surface the conflict
- Simplicity: The system is much simpler to implement and reason about than vector clocks
Cassandra made this trade-off deliberately. For its intended use cases (time-series data, event logs, caching), LWW is often acceptable, and the simplicity benefits outweigh the theoretical loss of concurrency detection.
12. Decision Framework
Choosing a Clock Mechanism
Comparison Matrix
| Property | Lamport | Vector Clock | HLC | DVV | ITC |
|---|---|---|---|---|---|
| Detects causality ( | Forward only | Bidirectional | Forward only | Bidirectional | Bidirectional |
| Detects concurrency ( | No | Yes | No | Yes | Yes |
| Timestamp size | |||||
| Close to wall time | No | No | Yes | No | No |
| Dynamic participants | N/A | Hard | N/A | Hard | Native |
| Implementation complexity | Trivial | Moderate | Moderate | High | High |
| Production systems | Many | Riak, (old) Dynamo | CockroachDB | Riak 2.0+ | Research |
When to Use What
Use Lamport timestamps when:
- You only need a total ordering consistent with causality
- Message size is critical (IoT, embedded systems)
- You are implementing mutual exclusion or total-order broadcast
Use vector clocks when:
- You need to detect and resolve concurrent updates
- The number of participants is small and fixed
- Correctness of conflict detection is paramount
Use HLC when:
- You need causal ordering with
size - Wall-clock-meaningful timestamps are valuable (debugging, TTL, time-travel queries)
- You can tolerate not detecting concurrency from timestamps alone
Use dotted version vectors when:
- You are building a replicated key-value store
- Clients come and go but replicas are fixed
- You need vector-clock-like concurrency detection without clock growth
Use interval tree clocks when:
- Participants dynamically join and leave
- No central coordinator exists to assign IDs
- You need a purely decentralized solution
13. Advanced Topics
Consistent Snapshots
A consistent snapshot (or consistent cut) of a distributed system is a set of local states — one per process — such that no message is recorded as received without its corresponding send also being recorded. Formally, a cut
Vector clocks enable efficient consistent snapshot detection. A set of events
This condition states that no process has "seen more" of another process than that process has recorded in the snapshot. This is the basis of the Chandy-Lamport snapshot algorithm.
Plausible Clocks
When vector clocks are too large but concurrency detection is needed, plausible clocks offer a compromise. The idea: use a fixed-size vector (say, 8 entries) and hash process IDs into vector positions. Multiple processes may map to the same position, causing "false concurrency" (the clock reports two events as concurrent when one actually happened before the other) but never "false causality."
Where
With
Bloom Clocks
A more recent approach uses Bloom filters to compress vector clock information. A Bloom clock maintains a Bloom filter of
fixed space where is the Bloom filter size (independent of )- False positives for causality (may incorrectly claim
when ) - No false negatives (if
, the clock always reports it correctly)
The error direction is reversed compared to plausible clocks: Bloom clocks may report false causality, while plausible clocks report false concurrency. The choice depends on which error is more acceptable for the application.
Version Vectors vs. Vector Clocks
These terms are often used interchangeably, but there is a subtle distinction:
- Vector clocks assign a timestamp to every event (including internal events)
- Version vectors assign a timestamp to every version of a data item (only update events)
Version vectors increment only on writes to the replicated object, not on every internal computation step. This means version vectors can have the same counter value for long periods between updates, but they still correctly characterize the happens-before relation among update events.
In practice, the algorithms are identical — the difference is what constitutes an "event" that triggers a clock increment.
Causal Broadcast
Logical clocks enable causal broadcast — a communication primitive guaranteeing that if message
Implementation with vector clocks:
class CausalBroadcast {
private vc: VectorClock;
private buffer: Array<{
msg: unknown;
senderVc: VectorTimestamp;
senderId: string;
}> = [];
private readonly deliver: (msg: unknown) => void;
constructor(
processId: string,
deliverCallback: (msg: unknown) => void
) {
this.vc = new VectorClock(processId);
this.deliver = deliverCallback;
}
/**
* Broadcast a message to all processes.
* In practice, this uses the network layer to send to all peers.
*/
broadcast(msg: unknown): VectorTimestamp {
return this.vc.send();
// Network layer sends (msg, vc.snapshot()) to all peers
}
/**
* Called when a message arrives from the network.
* Buffers the message if causal dependencies are not yet satisfied.
*/
onReceive(
msg: unknown,
senderVc: VectorTimestamp,
senderId: string
): void {
this.buffer.push({ msg, senderVc, senderId });
this.tryDeliver();
}
/**
* Attempt to deliver buffered messages whose causal
* dependencies have been satisfied.
*/
private tryDeliver(): void {
let delivered = true;
while (delivered) {
delivered = false;
for (let idx = 0; idx < this.buffer.length; idx++) {
const entry = this.buffer[idx];
if (this.canDeliver(entry.senderVc, entry.senderId)) {
// Remove from buffer
this.buffer.splice(idx, 1);
// Update our vector clock
this.vc.receive(entry.senderVc);
// Deliver to application
this.deliver(entry.msg);
delivered = true;
break; // Restart scan — new deliveries may be unblocked
}
}
}
}
/**
* A message from sender S with vector V can be delivered if:
* 1. V[S] == our_vc[S] + 1 (this is the next expected message from S)
* 2. For all other processes P: V[P] <= our_vc[P]
* (we have already received everything the sender had received before
* sending this message)
*/
private canDeliver(
senderVc: VectorTimestamp,
senderId: string
): boolean {
const ourSnapshot = this.vc.snapshot();
for (const [pid, senderVal] of senderVc) {
const ourVal = ourSnapshot.get(pid) ?? 0;
if (pid === senderId) {
// We expect exactly the next message from the sender
if (senderVal !== ourVal + 1) return false;
} else {
// We must have seen everything the sender saw
if (senderVal > ourVal) return false;
}
}
return true;
}
}This guarantees causal delivery ordering without requiring a central sequencer. Messages that arrive "too early" (before their causal dependencies) are buffered until the dependencies are satisfied.
Matrix Clocks
A matrix clock extends vector clocks by tracking not just "what I know" but "what I know that everyone else knows." Process
- Row
= 's own vector clock (what knows) - Row
= 's estimate of 's vector clock (what thinks knows)
The key use case: garbage collection. If every entry in column
Matrix clocks have
Relationship to CRDTs
Conflict-free Replicated Data Types (CRDTs) are built on the mathematical foundations of vector clocks. The join semilattice structure of vector clocks (
A state-based CRDT needs:
- A partial order on states (vector clocks provide this via
) - A join operation that is commutative, associative, and idempotent (component-wise
satisfies all three) - The state space forms a join-semilattice (vector timestamps do)
This is not a coincidence. CRDTs were designed by the same research community that developed vector clocks, and the lattice structure was a deliberate choice to enable automatic conflict resolution with the same causal guarantees that vector clocks provide.
The CAP Connection
Logical clocks are central to the CAP theorem trade-offs:
- CP systems use logical clocks (often Lamport timestamps) within consensus protocols (Raft, Paxos) to totally order operations. The Raft log index is essentially a Lamport timestamp for the replicated log.
- AP systems use vector clocks to detect conflicts that arise from concurrent writes during partitions. The vector clock comparison tells you exactly which writes conflict and which are causally ordered.
- HLC-based systems (like CockroachDB) use the combination of physical and logical time to navigate between CP and AP depending on whether a partition is occurring.
Understanding logical clocks is therefore not just an academic exercise — it is the mechanism by which CAP trade-offs are implemented in practice.
Further Reading
- Lamport's original paper: "Time, Clocks, and the Ordering of Events in a Distributed System" (1978) — still one of the most readable papers in computer science
- Fidge/Mattern: Independent co-inventors of vector clocks (1988) — Fidge's paper is shorter, Mattern's is more thorough
- Kulkarni et al.: "Logical Physical Clocks and Consistent Snapshots in Globally Distributed Databases" (2014) — the HLC paper
- Almeida et al.: "Interval Tree Clocks" (2008) — elegant solution to dynamic participants
- Preguiça et al.: "Dotted Version Vectors" (2012) — solving the sibling explosion problem
- Corbett et al.: "Spanner: Google's Globally Distributed Database" (2012) — the TrueTime approach
- Amazon's Dynamo paper: "Dynamo: Amazon's Highly Available Key-value Store" (2007) — vector clocks in production
- Previous: CAP Theorem — the fundamental trade-off that logical clocks help navigate
- Next: Consistency Models — the spectrum of consistency guarantees that logical clocks enforce