Databases 31 min read

Inside Uber’s Schemaless: Designing a Scalable MySQL‑Based Datastore

Uber built Schemaless, a MySQL‑backed, sharded JSON datastore with immutable cells, triggers, and secondary indexes, to overcome PostgreSQL limits and achieve linear scalability, high write throughput, reliable change notifications, and operational resilience for its ride‑hailing platform.

21CTO

Jul 17, 2017

Inside Uber’s Schemaless: Designing a Scalable MySQL‑Based Datastore

Uber’s Need for a New Database

In early 2014 Uber’s rapid growth exhausted PostgreSQL storage, prompting a multi‑month effort to design a next‑generation database that could scale linearly by adding servers.

Key Requirements

Linear horizontal scalability with reduced response time.

High write throughput and immediate read‑after‑write capability.

Reliable downstream change notification.

Support for secondary indexes compatible with existing PostgreSQL queries.

Operational reliability for critical ride‑hailing workloads.

After evaluating Cassandra, Riak, MongoDB and others, Uber chose to build its own solution, inspired by Friendfeed and Pinterest.

Schemaless Overview

Schemaless is a MySQL‑backed, sharded, sparse three‑dimensional persistent hash table similar to Google’s Bigtable. The immutable unit is a cell identified by a UUID row_key, a column_name, and a monotonically increasing ref_key. Cells store JSON blobs and can be versioned by writing a new cell with a larger ref_key.

Data Model Example

For Uber trips the model uses columns such as BASE , STATUS , NOTES , and FARE_ADJUSTMENT . Each trip (identified by a UUID) has cells in these columns; multiple versions of a cell are distinguished by ref_key. The diagram below illustrates two trips and their cells.

Triggers

Schemaless provides a publish‑subscribe trigger mechanism. When a cell is written, registered trigger functions (e.g., bill_rider) are invoked, allowing asynchronous processing such as payment handling. Triggers are idempotent and can be retried safely.

Indexing

Secondary indexes can be defined on fields inside the JSON blob. Index queries are fast because they target a single shard. An example driver‑partner index in YAML is shown below.

table: driver_partner_index
datastore: trips
column_defs:
  - column_key: BASE
    fields:
      - { field: driver_partner_uuid, type: UUID}
      - { field: city_uuid, type: UUID}
      - { field: trip_created_at, type: datetime}

Architecture

The system consists of stateless work nodes that route client HTTP requests to storage nodes. Data is sharded (default 4096 shards) and each shard is replicated across multiple MySQL instances (one master, two slaves). Reads may hit any replica; writes go to the master.

Buffered Writes

To tolerate master failures, writes are first sent to a secondary “buffer” cluster and then to the primary cluster. Only when both succeed is the client notified. This technique reduces the chance of data loss.

MySQL Backend

Each shard is a separate MySQL database containing an entity table with columns added_id (auto‑increment primary key), row_key, column_name, ref_key, body (MessagePack‑compressed JSON), and created_at. A composite index on (row_key, column_name, ref_key) enables efficient look‑ups.

Summary

Schemaless now powers many Uber services, offering high availability, linear scalability, and a flexible JSON‑centric data model built on MySQL.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Uber Schemaless Scalable DataStore

Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.