Design and Optimization of Didi's Spatial‑Temporal Supply‑Demand System
Didi’s redesigned Spatial‑Temporal Supply‑Demand System replaces a single‑Redis bottleneck with a multi‑cluster routing layer, semantic sharding, multi‑level caching and delayed queues, achieving higher horizontal scalability, fault isolation, ~30 % latency reduction, increased cache hit rates, fewer query nodes, and faster, code‑free feature configuration.
Background
The Spatial‑Temporal Supply‑Demand System (SDS) was built to support Didi’s ride‑hailing business by calculating and storing massive supply‑demand features at various spatial (grid, district, city) and temporal (instant, minute, hour) granularities. These features feed real‑time algorithm models and are also persisted for offline training.
System Framework Evolution
2.1 Limitations of the legacy framework
Single‑Redis cluster limits horizontal scaling; expansion benefits diminish as the cluster grows.
Lack of failover – a cluster‑level outage leads to long service recovery times.
Performance bottlenecks: query QPS > 5 million, fan‑out QPS up to 8 million, with 15 ms SLA; p99 latency exceeds SLA during peaks.
R&D efficiency suffers because complex feature semantics require custom code, extending development cycles.
2.2 Advantages of the new framework
Storage layer now supports multiple Redis clusters via a routing layer, improving horizontal scalability and fault isolation.
Multi‑level caching, feature‑compute separation, and delayed‑queue replacement for scheduled tasks reduce load spikes and improve latency.
Component‑oriented feature production enables full‑process configuration, dramatically shortening iteration time.
System Construction Thoughts
3.1 Storage Governance
The new architecture splits the original Redis cluster into several smaller clusters and introduces a routing layer that maps feature keys to target clusters. This design achieves:
Better horizontal scalability of the storage layer.
Higher availability – failures are isolated to individual clusters, and hot‑updates enable seamless failover.
Data‑sharding strategies considered:
Hash‑based sharding : balances data size but can cause massive fan‑out when a feature is queried across many spatial‑temporal dimensions.
Semantic‑based sharding : groups features by their semantic tag, reducing fan‑out and providing “fast‑slow” isolation, at the cost of higher configuration overhead.
Implementation details (configuration example):
{
"data_parser": "json",
"parser_conf": [
{"field": "order_id", "jpath": "info.order_id", "type": "int"},
{"field": "city_id", "jpath": "info.city.city_name", "type": "string"}
]
}Another snippet shows rule‑engine configuration:
{
"rule_engine": "default",
"rule_engine_conf": "city_id == 'abc'"
}3.2 Performance Optimization
Local cache on query nodes stores static features, cutting Redis request volume.
Pre‑aggregation of high‑QPS features reduces fan‑out to Redis.
Introduce delayed queues to smooth periodic tasks, eliminating request spikes.
Key results after optimization:
Feature query node count reduced by 20%.
Static‑feature cache hit rate increased by 20%.
Redis p99 latency dropped ~30% during peaks.
3.3 Development Efficiency – Configuration Capability Upgrade
The legacy system required extensive custom code for complex feature definitions. The new approach abstracts the feature production pipeline into reusable components (data parser, rule engine, etc.) and orchestrates them via declarative configuration, enabling:
Component‑level horizontal scaling.
Full‑process configuration without code changes.
Summary
Architecture decisions must align with business maturity. Early‑stage low‑traffic systems can rely on a single storage cluster, while mature, high‑traffic services benefit from multi‑cluster routing, component‑based feature production, and extensive performance tuning. Maintaining clean code, avoiding excessive allocations, and applying systematic profiling (e.g., Go pprof) are essential for sustaining performance at scale.
Didi Tech
Official Didi technology account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.