Databases 15 min read

Design and Implementation of Fusion-NewSQL: A NewSQL System Built on Distributed NoSQL Storage

Fusion‑NewSQL is a NewSQL layer built atop Didi’s distributed KV store Fusion, translating MySQL queries into Redis‑style hashmaps, asynchronously maintaining secondary indexes, supporting fast Hive‑to‑Fusion loads and Elasticsearch integration, thereby delivering over 2 million QPS, 600 TB storage and flexible schema evolution for dozens of services.

Didi Tech

Oct 8, 2019

Design and Implementation of Fusion-NewSQL: A NewSQL System Built on Distributed NoSQL Storage

Background : Didi’s rapid business growth caused a sharp increase in data volume and request load, exposing the limitations of traditional sharding and schema‑change processes. A NewSQL solution was needed to handle high throughput, low latency, and flexible schema evolution.

Problem Statement : Frequent schema changes across multiple business lines (e.g., fast‑car, dedicated‑car) made sharding cumbersome, increased DBA workload, and hindered support for secondary indexes.

Open‑Source Survey : TiDB was evaluated but rejected due to its high latency (2‑PC transaction overhead), unnecessary distributed transaction support, high storage cost, and difficulty integrating with existing offline‑to‑online pipelines.

Foundation : Fusion‑NewSQL is built on Didi’s in‑house distributed KV store Fusion, which uses a Codis‑style architecture, Redis‑compatible protocol, and RocksDB as the storage engine. Fusion already serves hundreds of services internally.

Architecture Overview : The system consists of:

DiseServer – parses MySQL protocol.

Fusion Data Cluster – stores raw data.

Fusion Index Cluster – stores secondary index data.

ConfigServer – manages schema configuration.

Consumer – asynchronously builds indexes from MySQL‑style binlog data.

MQ and Zookeeper – provide messaging and coordination.

Data Write Flow :

Client sends MySQL request to DiseServer.

DiseServer validates SQL against the schema.

SQL is transformed into a Redis HashMap and sent to the Data cluster.

Data nodes write to WAL and RocksDB.

WAL is consumed, converted to MySQL‑Binlog, and published to MQ.

Consumer reads the binlog, builds index entries, and writes them to the Index cluster.

Data Read Flow : Queries are parsed by DiseServer, which selects the appropriate index. For primary‑key lookups, a direct GET on the Data cluster is performed. For secondary‑index queries, a SCAN over the Index cluster retrieves matching primary keys, which are then fetched from the Data cluster.

Storage Structure : MySQL rows are stored as Redis hashmaps; the key is table_name + primary_key. Unique indexes store rowkey as the value, while non‑unique indexes store rowkey as part of the key to preserve uniqueness.

Schema Changes : Schema updates are submitted via a ticketing system, approved, and pushed to ConfigServer. Nodes receive the new schema and apply it lazily: missing fields get default values, extra fields are ignored during reads. Index addition is handled in two steps – immediate indexing for new data and offline back‑fill for historical data.

FastLoad (Hive → Fusion‑NewSQL) : A dedicated FastLoad platform converts Hive tables into RocksDB SST files containing pre‑formatted KV pairs, bypassing the full write path and minimizing impact on online services.

ElasticSearch Integration : For complex queries that require multi‑field range scans, an ES index type is introduced. Selected fields are indexed in Elasticsearch; queries are translated into ES DSL, primary keys are retrieved, and final data is fetched from Fusion‑NewSQL. KV indexes are used for low‑latency simple queries, while ES indexes handle heavy analytical workloads.

Summary : Fusion‑NewSQL now serves over 70 core Didi services, handling >2 million QPS and >600 TB of data. It demonstrates that a NewSQL layer built atop an existing NoSQL store can achieve high performance and flexibility with modest development effort.

Future Work : Support limited cross‑node transactions, implement real‑time indexing to replace asynchronous pipelines, and extend MySQL protocol compatibility.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Indexing NewSQL fusion MySQL Compatibility Schema Evolution

Written by

Didi Tech

Official Didi technology account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.