Architecture Evolution and Scaling Solutions of Kuaidi Dache (Fast Taxi) Service
This article details the rapid traffic growth challenges faced by Kuaidi Dache from 2013‑2014 and presents representative architectural bottlenecks and the engineering solutions—including LBS optimization, long‑connection redesign, distributed refactoring, a wireless open platform, real‑time monitoring, and data layer transformation—that enabled stable, scalable, high‑performance ride‑hailing services.
From the end of 2013 to the second half of 2014, the fast taxi system experienced rapid traffic growth, presenting challenges that required solving complex problems without affecting online services. This article describes representative issues and solutions encountered during the architecture evolution of Kuaidi Dache.
LBS Bottleneck and Solutions
The basic system model (Figure 1) shows drivers reporting GPS coordinates to MongoDB, passenger order matching, long‑connection push, and driver order acceptance. High write/read loads (40k+ writes/s, 10k+ reads/s) caused CPU spikes, query latency, throughput drop, and replication lag due to MongoDB 2.6.4's global lock and massive sub‑queries. The solution was to split the country into four regions, each with an independent MongoDB cluster.
Long‑Connection Service Stability
The original long‑connection service built on Mina suffered CPU saturation, memory pressure, and inefficient idle‑connection checks. Hardware issues stemmed from a single‑queue NIC causing one CPU core to handle all I/O interrupts. Replacing it with a multi‑queue NIC resolved the packet loss. The service was rewritten using AIO, adding custom features such as ByteBuffer pooling, broadcast buffer reuse, TimeWheel idle detection, and priority‑based sending.
Distributed System Refactoring
The monolithic Web system was split into three layers—business, service, and data. Strong dependencies were handled with Dubbo RPC, while weak dependencies used RocketMQ. A unified development process, SQL standards, and service degradation mechanisms were introduced.
Wireless Open Platform (KOP)
KOP addresses issues like per‑project changes, inconsistent request/response formats, lack of traffic protection, scattered business logic, and undocumented protocols. Design principles include access control, traffic allocation and degradation, real‑time traffic analysis, instant API publishing, and monitoring with per‑API metrics and alerts.
Real‑Time Computation and Monitoring
Based on Storm and HBase, a real‑time monitoring platform (Figure 2) aggregates logs, performs sum/average/group calculations, and stores results in RocketMQ, HBase, and MetaQ, ensuring stability under traffic spikes.
Data Layer Refactoring
To handle massive data, a data synchronization platform (Figure 4) using Canal, MySQL binlog Row mode, and MQ was built, supporting global and local ordering, replay, monitoring, and automated deployment.
Further, a real‑time data center (Figure 5) synchronizes front‑end MySQL shards to HBase, provides an SQL‑to‑HBase translation engine with secondary index support, and implements rowkey hashing, reverse ordering, and concatenation strategies for efficient queries.
Author: Wang Xiaoxue, architecture engineer at Didi Chuxing, originally Kuaidi Dache. Article originally from CSDN.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
