How Uber Built Schemaless: A Scalable MySQL‑Based NoSQL Datastore
Uber’s Schemaless datastore, described in the Mezzanine project, replaces a single PostgreSQL node with a highly available, horizontally scalable MySQL‑backed key‑value store, detailing its architecture, data model, triggers, indexing, fault‑tolerance, and operational considerations for massive trip data.
Overall Design of Schemaless
Uber needed a new datastore in early 2014 because rapid growth of its ride‑hailing service exhausted PostgreSQL storage capacity. Five critical requirements guided the design: linear horizontal scalability, high write throughput with immediate read‑after‑write, reliable downstream notifications, secondary indexing compatible with existing PostgreSQL queries, and strong operational reliability.
Evaluation of Existing Solutions
Common NoSQL options such as Cassandra, Riak, and MongoDB were compared. All offered linear scaling, but none provided loss‑less downstream change notifications or met Uber’s operational expertise criteria. Consequently Uber decided to build a custom solution.
Design Inspiration and Core Features
The design draws inspiration from Friendfeed’s schemaless storage model and Pinterest’s operational practices. Schemaless is a MySQL‑backed, sharded key‑value store that holds arbitrary JSON blobs without a fixed schema. Each write creates an immutable cell identified by a UUID row key , a column name , and a monotonically increasing ref key . Updates are performed by inserting a new cell with a larger ref key.
Data Model
A cell consists of:
row_key – a UUID serving as the primary identifier.
column_name – an application‑defined string.
ref_key – an integer version marker.
body – a compressed JSON blob.
Uber’s trip data is stored using four columns: BASE (static trip metadata), STATUS (payment state, may have multiple versions), NOTES (driver or rider comments), and FARE_ADJUSTMENT (price changes). The table below (image) illustrates two example trips with their cells.
Triggers
Schemaless supports triggers that fire when a new cell is written to a specified column. Triggers enable asynchronous processing such as billing: when a BASE cell is written, a trigger reads the trip’s STATUS column, attempts credit‑card payment, and writes the result back to STATUS . Triggers are idempotent and can be retried safely.
Indexing
Schemaless provides secondary indexes on fields inside the JSON blob. An example driver_partner_index indexes city_uuid and trip_created_at to enable fast lookup of all trips for a driver in a city over a time range. The index definition (YAML) is shown in the image.
Architecture
Schemaless consists of two node types:
Work nodes receive client HTTP requests, route them to storage nodes, and aggregate results.
Storage nodes hold the actual MySQL shards. Data is partitioned into a fixed number of shards (typically 4096) and each shard is replicated across multiple storage nodes for redundancy.
Read and Write Paths
Read requests may be served by any replica of the shard; the default is the primary to guarantee read‑after‑write consistency. Write requests are directed to the shard’s primary; the write is then asynchronously replicated to replicas.
Buffered Writes
To mitigate data loss during primary failure, Schemaless employs a buffered‑write technique: a write is first sent to a secondary cluster (the buffer) and then to the primary cluster. Only when both succeed does the client receive an acknowledgment. If the primary crashes before replication, the buffered cluster holds the data until the primary recovers.
MySQL Backend Schema
Each shard is a MySQL database containing an entity table with columns: added_id – auto‑increment primary key for linear disk writes. row_key, column_name, ref_key – together form a composite index for fast cell lookup. body – compressed JSON blob (MessagePack + ZLib). created_at – insertion timestamp used by triggers.
Trigger Implementation Details
Triggers are registered via annotations (e.g., @trigger in Python). When a cell is written, the framework invokes the associated function, passing the row key. The function must be idempotent because failures may cause multiple invocations. Leader election (via Zookeeper or Schemaless itself) assigns shards to worker processes; if a worker fails, the leader redistributes its shards.
Scalability and Fault Tolerance
The trigger framework can run up to the number of shards (e.g., 4096) and dynamically adds or removes workers based on load. Failures of a worker, the leader, or the shared coordination store do not halt processing; shards are reassigned, and in‑flight cells continue to be processed from replicas.
Conclusion
From an operations perspective, Schemaless offers a highly available, horizontally scalable datastore that serves as both a random‑access key‑value store and a change‑log source. Its MySQL foundation, immutable cell model, trigger system, and efficient secondary indexes make it well‑suited for Uber’s massive trip‑data workloads and for building fault‑tolerant, decoupled processing pipelines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
