Databases 32 min read

How Uber Built Schemaless: A Scalable MySQL‑Based NoSQL Datastore

Uber’s Schemaless datastore, described in the Mezzanine project, replaces a single PostgreSQL node with a highly available, horizontally scalable MySQL‑backed key‑value store, detailing its architecture, data model, triggers, indexing, fault‑tolerance, and operational considerations for massive trip data.

21CTO
21CTO
21CTO
How Uber Built Schemaless: A Scalable MySQL‑Based NoSQL Datastore

Overall Design of Schemaless

Uber needed a new datastore in early 2014 because rapid growth of its ride‑hailing service exhausted PostgreSQL storage capacity. Five critical requirements guided the design: linear horizontal scalability, high write throughput with immediate read‑after‑write, reliable downstream notifications, secondary indexing compatible with existing PostgreSQL queries, and strong operational reliability.

Evaluation of Existing Solutions

Common NoSQL options such as Cassandra, Riak, and MongoDB were compared. All offered linear scaling, but none provided loss‑less downstream change notifications or met Uber’s operational expertise criteria. Consequently Uber decided to build a custom solution.

Design Inspiration and Core Features

The design draws inspiration from Friendfeed’s schemaless storage model and Pinterest’s operational practices. Schemaless is a MySQL‑backed, sharded key‑value store that holds arbitrary JSON blobs without a fixed schema. Each write creates an immutable cell identified by a UUID row key , a column name , and a monotonically increasing ref key . Updates are performed by inserting a new cell with a larger ref key.

Data Model

A cell consists of:

row_key – a UUID serving as the primary identifier.

column_name – an application‑defined string.

ref_key – an integer version marker.

body – a compressed JSON blob.

Uber’s trip data is stored using four columns: BASE (static trip metadata), STATUS (payment state, may have multiple versions), NOTES (driver or rider comments), and FARE_ADJUSTMENT (price changes). The table below (image) illustrates two example trips with their cells.

Trip data model
Trip data model

Triggers

Schemaless supports triggers that fire when a new cell is written to a specified column. Triggers enable asynchronous processing such as billing: when a BASE cell is written, a trigger reads the trip’s STATUS column, attempts credit‑card payment, and writes the result back to STATUS . Triggers are idempotent and can be retried safely.

Trigger flow
Trigger flow

Indexing

Schemaless provides secondary indexes on fields inside the JSON blob. An example driver_partner_index indexes city_uuid and trip_created_at to enable fast lookup of all trips for a driver in a city over a time range. The index definition (YAML) is shown in the image.

Index definition
Index definition

Architecture

Schemaless consists of two node types:

Work nodes receive client HTTP requests, route them to storage nodes, and aggregate results.

Storage nodes hold the actual MySQL shards. Data is partitioned into a fixed number of shards (typically 4096) and each shard is replicated across multiple storage nodes for redundancy.

Architecture diagram
Architecture diagram

Read and Write Paths

Read requests may be served by any replica of the shard; the default is the primary to guarantee read‑after‑write consistency. Write requests are directed to the shard’s primary; the write is then asynchronously replicated to replicas.

Buffered Writes

To mitigate data loss during primary failure, Schemaless employs a buffered‑write technique: a write is first sent to a secondary cluster (the buffer) and then to the primary cluster. Only when both succeed does the client receive an acknowledgment. If the primary crashes before replication, the buffered cluster holds the data until the primary recovers.

Buffered write diagram
Buffered write diagram

MySQL Backend Schema

Each shard is a MySQL database containing an entity table with columns: added_id – auto‑increment primary key for linear disk writes. row_key, column_name, ref_key – together form a composite index for fast cell lookup. body – compressed JSON blob (MessagePack + ZLib). created_at – insertion timestamp used by triggers.

Trigger Implementation Details

Triggers are registered via annotations (e.g., @trigger in Python). When a cell is written, the framework invokes the associated function, passing the row key. The function must be idempotent because failures may cause multiple invocations. Leader election (via Zookeeper or Schemaless itself) assigns shards to worker processes; if a worker fails, the leader redistributes its shards.

Trigger scaling diagram
Trigger scaling diagram

Scalability and Fault Tolerance

The trigger framework can run up to the number of shards (e.g., 4096) and dynamically adds or removes workers based on load. Failures of a worker, the leader, or the shared coordination store do not halt processing; shards are reassigned, and in‑flight cells continue to be processed from replicas.

Conclusion

From an operations perspective, Schemaless offers a highly available, horizontally scalable datastore that serves as both a random‑access key‑value store and a change‑log source. Its MySQL foundation, immutable cell model, trigger system, and efficient secondary indexes make it well‑suited for Uber’s massive trip‑data workloads and for building fault‑tolerant, decoupled processing pipelines.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

mysqlUberTriggersSchemalessDataStore
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.