Evolution of Kuaidi Dache Architecture: Solving LBS Bottlenecks, Long‑Connection Stability, Distributed Refactoring, Open Platform, Real‑Time Monitoring, and Data‑Layer Transformation
This article details how Kuaidi Dache scaled from 2013 to 2015 by addressing LBS performance limits, redesigning long‑connection services, refactoring monolithic code into layered services with Dubbo and RocketMQ, building a secure open platform, implementing Storm‑based real‑time monitoring, and migrating data storage to sharded MySQL, Canal‑driven sync, and HBase for massive scalability.
Author Wang Xiaoxue, a Didi architecture engineer, describes the rapid growth of the Kuaidi Dache ride‑hailing system from late 2013 to mid‑2014 and the architectural challenges that arose.
LBS bottleneck and solution: Drivers report GPS coordinates every few seconds, stored in a MongoDB replica set. High read/write load caused CPU spikes, query latency over 800 ms, and replication lag due to database‑level locking in version 2.6.4. The team partitioned the country into four regions and deployed independent MongoDB clusters per region.
Long‑connection service stability: The original Mina‑based socket service suffered from single‑queue NIC CPU pinning, memory‑GC pressure, and inefficient idle‑connection checks. Re‑implementing the framework with Java AIO introduced byte‑buffer pooling, TimeWheel idle detection, and priority‑based message sending, dramatically improving stability.
System distributed refactoring: The monolithic web + TCP push system was split into three layers—business, service, and data. Strong dependencies use Dubbo RPC, weak dependencies use RocketMQ messaging. The refactor introduced coding standards, service degradation mechanisms, and a unified development workflow.
Wireless Open Platform (KOP): To solve client‑server integration issues, KOP introduced access‑key authentication, traffic throttling and AB‑testing, real‑time traffic analysis, dynamic API publishing, and per‑client monitoring with alerting.
Real‑time computation and monitoring: Built on Storm and HBase, the platform parses log lines into KV pairs, performs aggregation, and writes results to RocketMQ every minute. HBase stores only inserts to avoid row‑lock contention, while RocketMQ buffers spikes, ensuring stable TPS.
Data‑layer transformation: Front‑end sharding and a custom sync platform (Canal → MQ) unified data across multiple MySQL instances. To handle massive historical data, a real‑time data center was built on HBase with a SQL‑to‑HBase engine supporting secondary indexes, hash‑based rowkey dispersion, and ordered queries.
The article concludes with a discussion of the achieved scalability, reliability, and maintainability improvements across the entire Kuaidi Dache architecture.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.