How Kuaidi Dache Overcame MongoDB Limits and Built Real-Time Monitoring

Facing explosive growth in 2013‑2014, Kuaidi Dache re‑engineered its ride‑hailing platform by partitioning MongoDB, replacing single‑queue NICs with multi‑queue, rewriting its long‑connection service with AIO, adopting Dubbo and RocketMQ for micro‑services, and building a Storm‑HBase real‑time monitoring and data synchronization pipeline.

ITPUB
ITPUB
ITPUB
How Kuaidi Dache Overcame MongoDB Limits and Built Real-Time Monitoring

LBS Bottleneck and Regional Sharding

Drivers reported GPS coordinates every few seconds. The data were stored in a MongoDB replica set (primary‑secondary) that handled >40,000 writes/s and >10,000 reads/s. Using MongoDB 2.6.4, each write acquired a global database lock and LBS queries were split into many sub‑queries, causing:

CPU spikes on secondary nodes

Query latency >800 ms

Reduced query throughput

Significant replication lag

To eliminate the global lock contention, the nation was divided into four regions, each with an independent MongoDB cluster. User data for a region are stored only in its cluster, isolating load and restoring low‑latency reads.

System model diagram
System model diagram

Long‑Connection Service Stability

Hardware issue : a single‑queue NIC directed all network interrupts to one CPU core, saturating it while other cores stayed idle. Replacing the NIC with a multi‑queue model distributed interrupts across cores, eliminating packet loss and connection drops.

Software issue : the original framework built on Apache Mina suffered from coarse memory control, difficult GC tuning, inefficient idle‑connection checks, and periodic CPU spikes under heavy broadcast traffic. The service was rewritten with Java AIO and custom enhancements:

Scenario‑specific development for high‑volume broadcast workloads.

ByteBuffer pooling to reduce allocation pressure and GC overhead.

Reusing a single ByteBuffer across multiple channels during a broadcast to avoid memory copies.

TimeWheel‑based idle‑connection detection, smoothing CPU usage.

Priority‑aware data transmission, allowing critical messages to be sent before lower‑priority ones.

Distributed Refactor of the Core System

The monolithic web system originally combined HTTP services and TCP push logic, resulting in a massive codebase, tangled business and non‑business code, and slow release cycles. The system was split into three logical layers:

Business layer : domain‑specific logic.

Service layer : RPC and service governance.

Data layer : persistence and sharding.

Strong dependencies are wired with Dubbo (Alibaba‑open‑source RPC framework). Weak couplings use RocketMQ (Java‑centric messaging, similar to Kafka) for asynchronous communication. The refactor also introduced standardized development processes, code and SQL conventions, link‑level bottleneck analysis, and a service‑degradation mechanism.

Wireless Open Platform (KOP)

KOP was built to address frequent web releases, inconsistent request/response formats, lack of traffic protection, scattered business logic, and outdated documentation. Its core design principles are:

Access control: each client receives an identifier and secret key; requests are signed client‑side and verified server‑side.

Traffic allocation & degradation: per‑city or per‑client limits, AB‑testing, and guaranteed bandwidth for core APIs (login, order, payment). Excess traffic receives a throttling error code.

Traffic analysis: real‑time monitoring of IP, user, and API dimensions to detect malicious patterns and enforce temporary or permanent bans.

Real‑time publishing: API additions or removals take effect instantly without redeploying KOP, guarded by an approval workflow.

Real‑time monitoring: per‑API call count, success/failure rates, latency, minute‑level charts, historical comparison, and automated SMS alerts when thresholds are breached.

Real‑time Monitoring and Computation

A Storm‑HBase pipeline provides minute‑level visibility into system health and business metrics. The topology consists of two bolts:

Parse bolt : converts each raw log line into a key‑value pair.

Compute bolt : applies business rules to aggregate sums, averages, and group‑by results, then publishes the aggregates to RocketMQ every minute.

Aggregated results are stored in HBase using an insert‑only model with ordered rowkeys, avoiding row‑level lock contention. Even when log volume spikes, writes remain stable because inserts are spread across regions.

RocketMQ acts as a buffer, absorbing traffic bursts and providing fault tolerance. Brokers run on dedicated physical machines equipped with SSDs, ample memory, and a master‑slave configuration.

Monitoring system architecture
Monitoring system architecture

Data Layer Refactor and Synchronization Platform

Growth rendered a single MySQL instance and table insufficient, especially for coupon and order services. A generic sharding framework was built to split data across multiple databases and tables. Two major challenges emerged:

Data synchronization: front‑end shards (multiple MySQL instances) needed to be consolidated into a single back‑end store, which MySQL replication could not handle.

Offline extraction: data scientists used Sqoop to dump data for batch analytics, causing heavy load and duplicate extraction across business scenarios.

The solution is a data‑sync platform that uses Canal to capture row‑based binlog events, converts them into MQ messages, and distributes them via RocketMQ. Key features include:

Global and local ordering guarantees.

Support for out‑of‑order concurrent processing.

Point‑in‑time replay of events.

Link monitoring with alerts.

Automated node deployment through a management console.

Data synchronization platform architecture
Data synchronization platform architecture

Real‑time Data Center on HBase

To eliminate the back‑end single‑database bottleneck, all front‑end MySQL shards are synchronized into HBase. A custom SQL‑to‑HBase engine parses incoming SQL statements, translates them into HBase API calls, and supports secondary indexes implemented on the client side:

Hash vs. Reverse : hash scattering distributes writes evenly but disables range queries; reverse ordering preserves descending sort on a field.

Scatter : randomizes rowkeys to balance region load, at the cost of range‑query capability.

Chain (unique index) : required for multi‑table consolidation; without a unique chain index, table unification fails.

The engine differs from Hive: Hive translates SQL to MapReduce jobs for large‑batch, low‑frequency analytics, whereas this engine targets low‑latency, high‑frequency, random‑read scenarios and can leverage the client‑side secondary indexes.

Real-time data center architecture
Real-time data center architecture

These architectural upgrades enabled the ride‑hailing platform to sustain massive traffic, reduce latency, and maintain high availability across its services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

System ArchitectureMicroservicesreal-time monitoringMongoDBRide Hailing
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.