Linearizability, Serializability, and TrueTime in Google Spanner
This article explains the concept of linearizability, contrasts it with serializability, and describes how Google Spanner uses the TrueTime API and commit‑wait mechanisms to provide external consistency and reliable snapshot reads in a globally distributed database system.
As data and computation scales exceed the capacity of single machines, distributed systems become necessary, but they must preserve the correctness of results as if executed on a single node. Linearizability guarantees that concurrent operations appear to occur instantaneously in an order consistent with real‑time, while serializability concerns transaction ordering for concurrency control.
The article illustrates linearizability with two scenarios: a history where operations can be reordered into a sequential history that respects real‑time order (linearizable), and a history that cannot be reordered without violating that order (non‑linearizable). Example histories are shown as code blocks:
start[W(1)]A
start[R(0)]B
end[W(1)]A
end[R(0)]B
start[R(1)]B
end[R(1)]BIt then discusses how a globally deployed system must assign a single, monotonically increasing timestamp to each transaction, a task complicated by clock skew across datacenters. Simple local‑clock approaches can invert the real‑world order of timestamps, leading to inconsistency.
Google Spanner solves this problem with the TrueTime API, which combines GPS and atomic clocks to provide a bounded time interval TT.now() (earliest, latest). Spanner assigns transaction timestamps from the upper bound of this interval and enforces a commit‑wait until the timestamp is safely in the past, ensuring external consistency (linearizability).
TrueTime’s architecture includes multiple time masters per datacenter and time‑slave processes that synchronize every 30 seconds, using Marzullo’s algorithm to discard outliers. The overall error is bounded between 1 ms and 7 ms, and Spanner’s commit‑wait typically waits about twice the error (~8 ms).
Spanner’s storage layer consists of zones, spanservers, tablets, and Paxos groups. Each key is stored as (key, timestamp) → value , enabling multi‑version storage. Transactions spanning multiple Paxos groups use two‑phase commit, while single‑group transactions rely on lock tables for serializability.
Snapshot reads in Spanner allow clients to read a consistent view of the database at a chosen timestamp t , provided t ≤ tsafe . If t exceeds tsafe , the system waits until the timestamp becomes safe.
In summary, Spanner’s TrueTime API and commit‑wait mechanism give the system a globally comparable timestamp space, making linearizability achievable across continents while supporting external consistency, snapshot reads, and high availability.
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.