How Didi Scales HBase for Real‑Time Orders, Geo‑Tracking, ETA and Monitoring
This article explains how Didi leverages HBase’s distributed architecture, multi‑language APIs, and custom rowkey designs to support online order queries, driver‑passenger trajectory tracking with GeoHash, real‑time ETA calculations, and a monitoring platform, while managing multi‑tenant resources through DHS and RS Group.
Background
Business Types – HBase, built on the Hadoop ecosystem, serves both offline batch jobs (e.g., daily reports, security analysis, model training) and online services that require low‑latency random access such as order and customer‑service queries.
Multi‑Language Support – Didi provides Java native API, Thrift Server (C++, PHP, Python), Phoenix JDBC, Phoenix QueryServer, MapReduce, Spark, and Streaming interfaces to accommodate diverse development preferences.
Data Types
Statistical and report data – small volume, flexible SQL queries via Phoenix.
Raw factual data – orders, GPS traces, logs; large volume, high consistency, low latency.
Intermediate results – model‑training inputs; large volume, high throughput.
Backup data – HBase used as an off‑site disaster‑recovery store.
Use‑Case Introduction
Scenario 1: Order Events
Requirements include online order‑lifecycle queries, historical order detail look‑ups, offline order‑status analysis, and handling 10 K writes/sec and 1 K reads/sec with 5 s data freshness.
Order Status Table
Rowkey: reverse(order_id) + (MAX_LONG - TS) Columns: various order states.
Order History Table
Rowkey: reverse(passenger_id|driver_id) + (MAX_LONG - TS) Columns: orders and related info per user within a time range.
Scenario 2: Driver‑Passenger Trajectory
Supports real‑time or near‑real‑time coordinate queries, large‑scale offline analysis, and geographic range queries. GeoHash converts latitude/longitude into strings representing rectangular areas, enabling coarse‑grained indexing while preserving privacy.
Because GeoHash blocks may not perfectly match circular query areas, a second‑stage filter checks the actual distance between GPS points and the query centre.
Rowkey designs:
Single‑user query: reverse(user_id) + (Integer.MAX_LONG‑TS/1000) Range query:
reverse(geohash) + ts/1000 + user_idScenario 3: ETA
ETA (estimated time of arrival) originally offline, now real‑time via HBase as a key‑value cache, reducing training time, supporting multi‑city parallelism, and minimizing manual intervention.
Model training with Spark every 30 minutes per city.
First stage reads all city data from HBase within 5 minutes.
Second stage completes ETA calculation within 25 minutes.
HBase data periodically persisted to HDFS for new model testing and feature extraction.
Rowkey: salting + city + type0 + type1 + type2 + TS Columns: order, feature.
Scenario 4: Monitoring Tool DCM
DCM monitors Hadoop cluster resources (NameNode, Yarn containers) and stores metrics in HBase via Phoenix, enabling second‑level query responses and front‑end dashboards.
Didi’s Multi‑Tenant Management on HBase
Didi treats a single HBase cluster with multiple tenants as the most efficient solution, but HBase lacks built‑in multi‑tenant controls. Challenges include resource visibility, project lifecycle management, and contention.
The Didi HBase Service (DHS) platform provides project lifecycle management, permission control, cluster resource allocation, and table‑level monitoring (read/write rates, memstore, block cache, locality). Users register projects, estimate resource needs, and receive a project overview page.
Using RS Group, the cluster is divided into logical sub‑clusters, allowing exclusive or shared resource pools. Table 1 (omitted) compares pros and cons of shared vs. exclusive resources.
Resource allocation strategy:
Low‑latency, low‑volume, low‑availability data → shared pool.
Latency‑sensitive, high‑throughput, high‑availability online services → dedicated RegionServer Group with 20‑30% headroom.
Periodic usage accounting generates billing for tenants.
RS Group
RS Group assigns a specific list of RegionServers to a group; tables are mounted to groups, and failures within a group do not cause region migration to other groups, achieving logical isolation and reducing management overhead.
Conclusion
Successful HBase adoption at Didi hinges on two key factors: guiding users to design effective table schemas and controlling resource allocation. Clear architecture knowledge, proactive platform support, and appropriate isolation (shared vs. exclusive) lower failure risk, reduce operational costs, and create a virtuous cycle that improves user experience and business growth.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
