Big Data 16 min read

Li Auto’s Trillion‑Row Real‑Time Car‑Network Analytics Using Hologres + Flink

Li Auto’s data team tackled the explosion of vehicle‑telemetry data—over a trillion rows and millions of signals per second—by redesigning their data foundation with Alibaba Cloud’s Hologres and Flink, achieving sub‑second latency, elastic scaling, high availability, and significant cost reductions across real‑time and offline workloads.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Li Auto’s Trillion‑Row Real‑Time Car‑Network Analytics Using Hologres + Flink

Introduction: Data Challenges in the Smart Car Era

With the rapid adoption of electric and intelligent vehicles, car‑network signal data has grown explosively. Li Auto now has more than 1 million connected cars, each reporting up to ten thousand signals per second, resulting in a data scale of trillions of rows and a strict end‑to‑end latency requirement of under 2 seconds to support digital twin, intelligent diagnosis, and vehicle warning scenarios.

Massive Car‑Network Signal Challenges

Each vehicle generates roughly 10 000 signals, leading to a stored data volume of over a trillion rows (petabyte‑level) and daily growth of hundreds of billions of rows. The system must guarantee high real‑time performance while handling this massive scale.

Challenge 1: Insufficient Stability

Write latency : During peak traffic (holidays) write RPS exceeds 1.5 million, causing noticeable delays.

Cold‑query resource saturation : Queries on data older than 30 days peak at over 10 000 QPS, consuming excessive compute resources.

Weak fault tolerance : Manual recovery can take up to 12 hours.

Incomplete processes : Lack of rigorous admission testing leads to resource mismatches and bugs.

No fallback mechanism : Upgrades require downtime and rollback is difficult.

Challenge 2: Weak Elasticity and High Cost

Resource scaling : To handle holiday peaks, resources are provisioned for maximum load, causing idle waste.

Coupled compute‑storage : Inability to scale compute independently reduces utilization.

Dual‑cluster, dual‑link redundancy : Guarantees stability but doubles resource cost.

Fragmented tech stack : Separate stacks for real‑time and offline pipelines increase development and operation overhead.

Complex cluster splitting : Scaling beyond 2 million vehicles requires data routing and consistency solutions.

Architecture Based on Hologres + Flink

At the end of 2024 Li Auto launched a full‑stack upgrade, introducing Alibaba Cloud Hologres and Flink to build a new generation car‑network data platform that is elastic, highly available, and low‑cost.

Write Layer : Serverless Flink provides high‑performance ingestion; Hologres offers write‑once‑query capability.

Storage Layer : Hologres hot‑cold tiering moves older data to OSS, improving the hot‑to‑cold ratio from 2:1 to 5:1 and reducing storage cost.

Compute Layer : Separate compute groups isolate write, processing, and query workloads; Serverless Computing enables low‑cost cold‑query and ETL on wide tables.

Business Layer : Unified real‑time and offline pipelines achieve stream‑batch convergence, halving storage cost.

Performance Benchmark

Write benchmark : In a realistic 1 million‑car scenario, 700 CU resources sustain over 1.5 million RPS with zero latency; a mocked 2 million‑car load reaches 3 million RPS and remains stable.

Query benchmark : With 500 CU, both single and mixed queries exceed 10 000 QPS; hot‑query P99 is ~10 s, cold‑query P99 ~27 s.

Vehicle Digital Twin View Scenario

The “Vehicle Digital Twin View” recreates the state of every signal for each vehicle at a single moment, supporting fault diagnosis, autonomous‑driving monitoring, after‑sales support, and scenario replay. The initial Binlog solution achieved low latency but generated data volumes several times larger than the raw source, leading to high hot‑storage costs.

Stability Assurance System

Pre‑prevention : Business admission SOP, change‑control SOP, vendor‑admission SOP; comprehensive Hologres monitoring (query, resource, storage I/O); shadow‑instance validation before release.

Mid‑damage control : Automatic scaling (20% up during holidays, 20% down after); rapid rollback within seconds; service degradation for non‑core queries.

Post‑guarantee : 99.9% SLA, 24/7 dedicated support; complete audit logs for self‑service bottleneck analysis.

Overall Benefits

Write performance +200% : 150 million RPS with zero delay under unchanged cost.

QPS +32% : Dedicated hot‑data warehouse improves resource utilization by 20%.

Compute cost -40% : Serverless Computing for cold queries reduces expense while meeting SLA.

Future Outlook

Serverless Flink unified ingestion to further simplify architecture and boost write elasticity.

Enhanced Hologres capabilities (e.g., Time Travel) for finer‑grained digital twin and cold‑storage management.

Expansion to new signal sources such as charging stations, bench‑test data, supporting autonomous driving and AI training scenarios.

FlinkReal-time analyticsscalabilitydata platformHologresCar Telemetry
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.