Distributed Databases and Time - Causal Order and Logical Clocks

Why Global Time Is Impossible

On a single server, every event can have a unique timestamp and the order of events is fully determined. Distributed systems give up that luxury. Multiple nodes hold their own clocks, and network latency is unpredictable, so deciding which of two events happened first is fundamentally hard. The problem is not engineering laziness; it is a property of physically separated systems.

Even with NTP synchronization, milliseconds to tens of milliseconds of skew remain. If event A occurred at 12:00:00.005 on node A and event B occurred at 12:00:00.008 on node B, the actual order cannot be determined within NTP's error bounds. This uncertainty fundamentally challenges transactional consistency in distributed databases, and the field has spent decades developing workarounds.

Lamport Clocks - Partial Order via Causality

Leslie Lamport's 1978 logical clock tracks causality (the happened-before relation) between events without depending on physical time. Each node holds a counter, increments it on local events, attaches it to outgoing messages, and updates to max(local, received) + 1 on receipt. The result is a timestamp that respects causal order even with no clock synchronization.

Lamport clocks guarantee that if A causes B, then A's timestamp is smaller than B's. The reverse, however, does not hold: a smaller timestamp does not imply causality. The system cannot distinguish concurrent events from causally related ones. Vector clocks were invented to overcome this limitation by carrying more information.

Vector Clocks - Detecting Concurrency

A vector clock in an N-node system represents time as an N-dimensional vector, one counter per node. Comparing two events' vectors lets you tell whether one happened before the other (one vector dominates the other elementwise) or whether they are concurrent (neither dominates). Amazon's Dynamo (2007 paper) used vector clocks to detect write conflicts.

The downside is that the vector grows with the cluster. Systems with thousands of nodes attach thousands of counters to every message, and bandwidth and storage overhead become substantial. Practical systems prune the vectors or switch to alternatives like the Hybrid Logical Clock to avoid this scaling problem.

Google Spanner's TrueTime - Embracing Uncertainty

Google's Spanner (2012) takes a unique approach by exposing physical clock uncertainty in its API. The TrueTime API returns the current time as an interval [earliest, latest], not a single value. By installing GPS receivers and atomic clocks in every data center, Spanner keeps the uncertainty under about 7 milliseconds typically.

Spanner's transactions wait for the TrueTime uncertainty interval to elapse before committing (commit-wait), guaranteeing external consistency (linearizability) as a result. With 7 ms uncertainty, each transaction adds 7 ms of waiting. The design trades dedicated hardware (GPS plus atomic clocks) for strong distributed transaction guarantees that other systems struggle to match.

Hybrid Logical Clock - The Practical Compromise

The Hybrid Logical Clock (HLC) combines physical and logical clocks into a single timestamp pair (physical_time, logical_counter). Physical time dominates normally, but events within the same physical millisecond are ordered by the logical counter. CockroachDB adopts HLC to deliver strong consistency without requiring specialized hardware.

HLC's appeal is that it works on ordinary NTP-synchronized servers and produces timestamps that are close to human-readable physical time. It does not match Spanner's strictness, but within NTP error bounds it provides practical consistency, resolving conflicts within the uncertainty window via retries. For most distributed databases that cannot install GPS in their data centers, HLC is the most reasonable middle ground.