Li Auto’s Trillion‑Row Real‑Time Car‑Network Analytics Using Hologres + Flink
Li Auto’s data team tackled the explosion of vehicle‑telemetry data—over a trillion rows and millions of signals per second—by redesigning their data foundation with Alibaba Cloud’s Hologres and Flink, achieving sub‑second latency, elastic scaling, high availability, and significant cost reductions across real‑time and offline workloads.
Introduction: Data Challenges in the Smart Car Era
With the rapid adoption of electric and intelligent vehicles, car‑network signal data has grown explosively. Li Auto now has more than 1 million connected cars, each reporting up to ten thousand signals per second, resulting in a data scale of trillions of rows and a strict end‑to‑end latency requirement of under 2 seconds to support digital twin, intelligent diagnosis, and vehicle warning scenarios.
Massive Car‑Network Signal Challenges
Each vehicle generates roughly 10 000 signals, leading to a stored data volume of over a trillion rows (petabyte‑level) and daily growth of hundreds of billions of rows. The system must guarantee high real‑time performance while handling this massive scale.
Challenge 1: Insufficient Stability
Write latency : During peak traffic (holidays) write RPS exceeds 1.5 million, causing noticeable delays.
Cold‑query resource saturation : Queries on data older than 30 days peak at over 10 000 QPS, consuming excessive compute resources.
Weak fault tolerance : Manual recovery can take up to 12 hours.
Incomplete processes : Lack of rigorous admission testing leads to resource mismatches and bugs.
No fallback mechanism : Upgrades require downtime and rollback is difficult.
Challenge 2: Weak Elasticity and High Cost
Resource scaling : To handle holiday peaks, resources are provisioned for maximum load, causing idle waste.
Coupled compute‑storage : Inability to scale compute independently reduces utilization.
Dual‑cluster, dual‑link redundancy : Guarantees stability but doubles resource cost.
Fragmented tech stack : Separate stacks for real‑time and offline pipelines increase development and operation overhead.
Complex cluster splitting : Scaling beyond 2 million vehicles requires data routing and consistency solutions.
Architecture Based on Hologres + Flink
At the end of 2024 Li Auto launched a full‑stack upgrade, introducing Alibaba Cloud Hologres and Flink to build a new generation car‑network data platform that is elastic, highly available, and low‑cost.
Write Layer : Serverless Flink provides high‑performance ingestion; Hologres offers write‑once‑query capability.
Storage Layer : Hologres hot‑cold tiering moves older data to OSS, improving the hot‑to‑cold ratio from 2:1 to 5:1 and reducing storage cost.
Compute Layer : Separate compute groups isolate write, processing, and query workloads; Serverless Computing enables low‑cost cold‑query and ETL on wide tables.
Business Layer : Unified real‑time and offline pipelines achieve stream‑batch convergence, halving storage cost.
Performance Benchmark
Write benchmark : In a realistic 1 million‑car scenario, 700 CU resources sustain over 1.5 million RPS with zero latency; a mocked 2 million‑car load reaches 3 million RPS and remains stable.
Query benchmark : With 500 CU, both single and mixed queries exceed 10 000 QPS; hot‑query P99 is ~10 s, cold‑query P99 ~27 s.
Vehicle Digital Twin View Scenario
The “Vehicle Digital Twin View” recreates the state of every signal for each vehicle at a single moment, supporting fault diagnosis, autonomous‑driving monitoring, after‑sales support, and scenario replay. The initial Binlog solution achieved low latency but generated data volumes several times larger than the raw source, leading to high hot‑storage costs.
Stability Assurance System
Pre‑prevention : Business admission SOP, change‑control SOP, vendor‑admission SOP; comprehensive Hologres monitoring (query, resource, storage I/O); shadow‑instance validation before release.
Mid‑damage control : Automatic scaling (20% up during holidays, 20% down after); rapid rollback within seconds; service degradation for non‑core queries.
Post‑guarantee : 99.9% SLA, 24/7 dedicated support; complete audit logs for self‑service bottleneck analysis.
Overall Benefits
Write performance +200% : 150 million RPS with zero delay under unchanged cost.
QPS +32% : Dedicated hot‑data warehouse improves resource utilization by 20%.
Compute cost -40% : Serverless Computing for cold queries reduces expense while meeting SLA.
Future Outlook
Serverless Flink unified ingestion to further simplify architecture and boost write elasticity.
Enhanced Hologres capabilities (e.g., Time Travel) for finer‑grained digital twin and cold‑storage management.
Expansion to new signal sources such as charging stations, bench‑test data, supporting autonomous driving and AI training scenarios.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
