Inside Alibaba's Doris KV Store: Architecture, Routing & Failover Secrets
This article examines Alibaba's internal Doris KV storage system, detailing why large companies build proprietary data products, the project's kickoff criteria, the two‑layer architecture, virtual‑node routing, failover mechanisms, and cluster scaling strategies for massive KV workloads.
1. How Large Companies View Internal Technical Products
Internet giants like Alibaba rely on massive e‑commerce platforms that generate huge user traffic and data volumes. While they can adopt open‑source solutions, they often develop proprietary infrastructure such as distributed caches, message queues, service frameworks, and databases to tailor to specific business scenarios, improve engineering capabilities, and create competitive barriers.
However, resources are limited and must be balanced between core business development and foundational technology projects. Engineers seeking technical growth prefer challenging foundational projects, creating a negotiation between engineers and management: engineers must demonstrate business value, innovation, and low risk to obtain support.
2. Project Kickoff – Key Technical Metrics
Current Situation
Numerous business services require massive KV storage and access.
Existing solutions (e.g., UDAS, multiple KV implementations) suffer from scaling difficulty, low write performance, and high operational cost.
Complex KV engines (e.g., Aspara) are hard to use and have poor performance.
These problems lead to development and operational difficulties, prompting the need for a unified, maintainable, and scalable KV system.
Product Positioning
The goal is a transparent, distributed KV storage engine that replaces fragmented solutions, supports Alibaba’s international and domestic sites, and delivers high availability, linear scalability, and low latency.
3. Overall Architecture
Two‑layer design: Client, DataServer + Store.
Four core components: Client, RataServer, Store, Administration.
Doris focuses on the distributed routing and cluster management layers, delegating actual data persistence to Berkeley DB.
4. Main Access Model
KV Client contacts Administration to obtain cluster topology and routing algorithm.
Client hashes the key to compute target servers.
Client sends data via a custom protocol to the chosen DataServer, which writes to the local Berkeley DB.
5. Partition Routing Algorithm
Doris uses a virtual‑node based routing algorithm that improves balance and reduces data movement during scaling.
Balance: even data distribution.
Fluctuation: X/(M+X) – better than traditional consistent hashing X/M.
5.1 Example
Two physical nodes vs. three physical nodes illustrate how virtual nodes map to servers.
The virtual‑node index is calculated as:
virtual_node_index = hash(md5(key)) mod virtual_node_countDuring cluster expansion, the virtual‑node count remains constant; only the mapping between virtual nodes and physical servers is updated, minimizing data migration.
6. Failover Strategy
Doris targets 99.997% availability through data redundancy and failover mechanisms.
6.1 Redundant Storage
Dual‑write (W=2, R=1) across two nodes per group.
Copy‑on‑write and update logs ensure consistency.
Data is written to both groups concurrently, providing multiple copies.
6.2 Failure Scenarios
Transient failure – short‑term unavailability.
Temporary failure – e.g., server upgrade, network glitch, with recovery within hours.
Permanent failure – server goes offline.
Different strategies are applied based on the failure type, ranging from retry and temporary logging to adding new servers for permanent loss.
6.3 Health Check & Configuration Pull
ConfigServer heartbeats DataServer.
Clients report failures.
Clients periodically pull configuration.
If a server becomes unreachable, the control center marks it as temporarily failed and notifies all clients.
7. Cluster Scaling Design
Elastic scaling is essential for handling increased load. Adding a new node requires migrating only the data belonging to the newly assigned virtual nodes.
7.1 Data Migration During Expansion
Old routing: Route1(key) = {pn1, pn2}.
New routing after adding node X: Route2(key) = {pn1, pnx}.
Only the data on pn2 needs to move to pnx.
7.2 Expansion Procedure
Add a physical server to a group and start the Doris process.
Temporarily mark the entire group as failed.
Recompute virtual‑node distribution and copy the affected files to the new server.
Restore the group, allowing normal reads/writes to resume.
The project spent about six months developing Doris, which has since powered multiple Alibaba services for years with minimal operational incidents, scaling to hundreds of nodes.
8. Patent Considerations
Large enterprises often encourage engineers to file patents to strengthen their IP portfolio and protect against “patent trolls.” Companies typically provide legal support and modest financial incentives for successful filings.
9. Conclusion
Designing a distributed KV storage system like Doris involves addressing data consistency, high availability, and seamless scaling—challenges that surpass those of typical distributed systems. Engaging in such technically demanding projects not only yields valuable products for the company but also accelerates personal skill growth and career prospects.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
JavaEdge
First‑line development experience at multiple leading tech firms; now a software architect at a Shanghai state‑owned enterprise and founder of Programming Yanxuan. Nearly 300k followers online; expertise in distributed system design, AIGC application development, and quantitative finance investing.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
