Changan Automotive Big Data Platform: Challenges and Practices in Connected Vehicle Scenarios
This article outlines the rapid growth of data in the smart automotive sector and details Changan's big data platform challenges—high cost, data accessibility, and operational complexity—and the practical migration from a Lambda to a unified Kappa architecture that delivers significant storage, compute, and maintenance efficiencies.
Background In recent years, the rapid development of the intelligent vehicle industry has led to an explosive increase in data. Changan’s big data platform supports most of the company’s production‑related traffic and daily business applications. The article shares the challenges faced by Changan in building a big data platform for vehicle‑network (IoV) scenarios and the concrete practices implemented.
Challenges Faced by Changan The platform confronts three major issues due to massive data volume and fast‑growing business: high cost, difficulty in data utilization, and cumbersome operations. Daily new data exceeds 20 TB, reaching nearly 9 PB per year, with full‑life‑cycle storage requirements for new‑energy vehicles. Query performance is poor, real‑time processing is limited, and the Lambda architecture involves many components, making maintenance and scaling difficult.
Pre‑Transformation Architecture and Challenges The original architecture was a classic Lambda design with separate real‑time and batch pipelines. Real‑time data were processed by Flink and written to analytical stores such as Doris, ClickHouse, or StarRocks, while batch data flowed from Kafka to HDFS, then to Parquet tables for t+1 analysis. This setup suffered from high storage cost, low query speed, and complex development across multiple languages (Java, SQL, Python).
Post‑Transformation Architecture To address the challenges, the platform was upgraded to a unified Kappa architecture, eliminating the separate batch layer and achieving 100 % real‑time processing. All components were consolidated into a single SaaS‑style service with a single engine and SQL as the development language, supporting real‑time, batch, ad‑hoc queries, and various workloads.
Benefits and Effects
1. Reducing Cost Storage cost dropped by 65 % (from 2.8 TB to 831 GB for a single‑copy table) thanks to Parquet‑based encoding, custom map‑format storage, and two‑level deduplication (row‑level and signal‑level). Compute cost was cut by over 50 % by converting t+1 batch jobs into frequent incremental processing (e.g., every 5 minutes), reducing Spark CU usage from 14 to 3.5.
2. Improving Data Accessibility Query performance increased threefold on average, with cross‑day queries now completing in about five minutes for 2 trillion rows. The platform achieved 100 % real‑time data availability, enabling rapid development of data products without the previous multi‑component integration effort.
3. Simplifying Operations The component count dropped from over ten to a single managed SaaS service, reducing operational overhead. Linear scalability was achieved, with resource consumption growing roughly proportionally to data volume, avoiding costly yearly upgrades.
Conclusion and Future Plans The ultimate goal is to make IoV data as easy to use as tap water, enabling innovative vehicle applications. Future work includes expanding scenario coverage and integrating AI to provide more self‑service capabilities for business users.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.