Fundamentals 11 min read

Why Unreliable Clocks Threaten Distributed Systems—and How to Fix Them

This article examines how unreliable physical clocks—both wall and monotonic—affect distributed systems, compares synchronous and asynchronous network timing, illustrates conflicts caused by timestamp drift, and presents logical clocks and Google’s TrueTime as robust solutions for achieving consistent ordering and data reliability.

Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Xiaokun's Architecture Exploration Notes
Why Unreliable Clocks Threaten Distributed Systems—and How to Fix Them

Comparison of Synchronous and Asynchronous Network Clocks

In synchronous networks a global clock is coordinated via NTP, while each node in an asynchronous network runs its own independent clock, leading to latency and drift issues.

Synchronous vs Asynchronous Clock Diagram
Synchronous vs Asynchronous Clock Diagram

Unreliable Monotonic and Wall Clocks

Wall clocks (e.g., Java System.currentTimeMillis() ) represent absolute time points and rely on NTP, making them unsuitable for measuring durations due to drift and possible backward jumps. Monotonic clocks (e.g., Java System.nanoTime() ) advance only forward and are appropriate for measuring elapsed time within a single process, but they are not comparable across machines.

Monotonic vs Wall Clock Accuracy
Monotonic vs Wall Clock Accuracy

Quartz crystal drift can cause time inaccuracies and backward jumps.

Large offsets between local clocks and NTP can cause false expirations.

Misconfiguration or network latency of NTP services further degrades reliability.

Problems Caused by Clock Dependence in Distributed Systems

Using timestamps for ordering can lead to anomalies such as write‑loss under the “last‑write‑wins” (LWW) rule when nodes have unsynchronized clocks.

Example: two clients write conflicting values at 42.004 s and 42.003 s to different nodes; the node that receives the later timestamp discards the earlier write, causing data loss.

LWW Conflict Example
LWW Conflict Example

Logical Clocks as a Solution

Logical clocks provide a globally increasing counter that captures the causal order of events without relying on physical time, making them safer for conflict resolution.

TrueTime and Global Snapshot Clocks

Google Spanner uses the TrueTime API, which reports a confidence interval for the wall clock, allowing the system to wait out uncertainty and achieve globally consistent snapshot isolation.

If intervals do not overlap, ordering is clear.

If they overlap, Spanner delays commits until the interval passes, reducing uncertainty to about 7 ms using GPS or atomic clocks.

Logical Clock vs TrueTime Comparison
Logical Clock vs TrueTime Comparison

Summary of Distributed‑System Clock Issues

When designing distributed systems, one must assume that any node may pause, clocks may drift, and network partitions may occur; robust designs must account for these factors to ensure data reliability.

Distributed System Issues Summary
Distributed System Issues Summary
distributed systemsTrueTimemonotonic clocklogical clockclock synchronization
Xiaokun's Architecture Exploration Notes
Written by

Xiaokun's Architecture Exploration Notes

10 years of backend architecture design | AI engineering infrastructure, storage architecture design, and performance optimization | Former senior developer at NetEase, Douyu, Inke, etc.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.