How Didi Built Fusion: From NoSQL to NewSQL and Distributed Database Design
This article details Didi's evolution from a self‑built NoSQL system called Fusion to a NewSQL solution and outlines the challenges, architectural highlights, data migration mechanisms, and future distributed database design, providing concrete metrics, diagrams, and practical lessons learned.
Fusion – Distributed NoSQL Storage
Fusion is a C++‑implemented distributed NoSQL database that implements the Redis protocol and persists data with RocksDB. It serves >400 business services across >300 clusters, stores ~1,500 TB of data and handles peak QPS >14 million, all with fully automated operations.
Architecture
The system is composed of four layers: an access (proxy) layer, a cluster‑management layer, a persistence layer (RocksDB), and a high‑availability layer.
Data‑flow integration (FastLoad DTS)
FastLoad is a one‑stop DTS platform that bridges Hive and Fusion. Users submit jobs via a web console or open API. The platform extracts data from Hive, shuffles and sorts it, writes the result as SST files, and uses RocksDB’s ingest feature to load the files directly into Fusion, bypassing the proxy so that the data becomes immediately readable via the Redis protocol.
Online hot migration between clusters
Fusion supports online hot migration (full + incremental) without service interruption. The source node retains incremental logs, creates a full snapshot, generates temporary SST files, streams them to the destination master, which forwards them to its slaves. After the full sync completes, incremental log replication resumes.
NewSQL Layer on Fusion
The NewSQL effort adds SQL‑compatible schema, binlog, secondary indexes and transaction support on top of Fusion, aiming for easy schema changes, unlimited storage and higher cost‑effectiveness.
Key challenges
Detecting user schemas on a KV store.
Generating MySQL‑compatible binlogs from KV operations.
Implementing secondary indexes.
Providing transaction support and interaction.
Architecture overview
DDL operations are routed to a control console and stored in a configuration center.
Proxies parse SQL, translate it to KV, and write to Fusion.
Fusion emits MySQL‑format binlogs to a message queue.
An index service consumes the binlogs and builds secondary indexes asynchronously; a Redis hash‑tag is added to index keys so that all entries for a column reside on the same node.
Schema management
DDL is decoupled from the data path via the configuration center. The proxy supports INSERT/REPLACE/DELETE/SELECT/UPDATE, focusing on single‑table large‑scale storage as a complement to MySQL.
MySQL‑compatible binlog generation
When generating a binlog, Fusion records the original value for UPDATE operations (before‑value + after‑value) and only the new value for INSERT, ensuring full data reconstruction.
Secondary indexes (unique & non‑unique)
Indexes are built asynchronously from the binlog stream. Because Fusion shards data by hash, a Redis hash‑tag is added to the index key to guarantee that all index entries for a given column are stored on the same node, enabling efficient scans.
Transaction support
Distributed transactions are avoided by requiring users to place rows that need atomicity on the same hash‑tag, turning a distributed transaction into a single‑node transaction. Single‑node transactions use RocksDB’s built‑in transaction engine. Lua scripts are employed for transaction interaction, allowing users to embed custom logic directly in Fusion.
Limitations of the pseudo‑distributed NewSQL
Only single‑node transactions.
Only single‑node indexes; hash‑based sharding prevents cross‑node scans.
Indexes are asynchronous; no distributed‑transaction guarantee.
No join support.
Lacks elastic scale‑out.
Future Distributed Database Design
To overcome the above limits, a purpose‑built distributed database is planned with the following goals:
True distributed transactions.
Unlimited data and index scalability.
Real‑time indexing.
Elastic scale‑out.
Strong consistency with multiple replicas.
Full SQL compatibility.
Planned architecture
The design follows a classic distributed KV architecture: range partitioning, Raft‑based strong consistency, automatic region splitting, and global scan capability. The implementation roadmap is:
Build a robust KV engine with Raft consensus, auto‑split and global scan.
Add a SQL parser on top of the KV core.
Replace the current NewSQL layer with a fully distributed transaction and indexing subsystem.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
