Great Wall Motor's Vehicle Networking Platform Leveraging CKafka for Scalable Data Processing
Great Wall Motor’s vehicle networking platform uses MQTT to collect data from millions of cars and CKafka’s cloud‑based Kafka to provide scalable, reliable, real‑time stream processing, buffering, and storage, enabling decoupled services, fault detection, offline analysis, and cost‑effective O&M.
Great Wall Motor is a global intelligent technology company whose business includes automobile design, R&D, production, sales and service, with brands such as WEY, Haval, Tank, Ora and Great Wall pickup. In 2022 it sold 1,067,523 vehicles, exceeding one million units for seven consecutive years. The intelligent vehicle penetration rate reaches 86.17%, and vehicle networking, as one of the two major intelligent application directions, is developing rapidly.
The vehicle networking platform covers in‑vehicle bus data reporting, remote control, vehicle‑machine configuration downlink, push files, push messages, operational care, etc., achieving decoupling between the vehicle end and the business platform and enabling efficient business docking and integration.
Main scenarios include: vehicle data reporting (motor, position, engine, whole‑vehicle data, battery, alarms, etc.) via TBox to the vehicle networking platform for real‑time data processing, calculation and inference to provide vehicle status query and alarm services; remote control via mobile APP or intelligent device integration of the vehicle networking platform capabilities to realize remote control and diagnosis.
The platform currently connects millions of vehicles, with peak online concurrency reaching one million vehicles. Vehicle‑reported signal data is large in volume and high in upload frequency, leading to explosive data growth that poses severe challenges for massive real‑time data processing and analysis.
To address these challenges the system requires: high processing timeliness (query timeliness, analysis decision, monitoring alarm); large data volume with stability, achieved through distributed architecture, parallel expansion, low coupling, high availability and data security.
IoT devices usually have limited performance and cannot easily use traditional popular message middleware; they typically rely on MQTT for message transmission. However, MQTT has several drawbacks: it only provides queuing rather than stream processing, cannot handle usage surges due to lack of buffering, most MQTT brokers do not support high scalability, asynchronous processing often involves long offline periods, it lacks good integration with other enterprise parts, it usually relies on a single edge‑based infrastructure, and it cannot reprocess events.
Because MQTT data may be lost before processing and cannot meet the challenges of massive real‑time data processing and analysis, a more suitable solution is needed.
Kafka, as a distributed message queue, offers multi‑partition, zero‑copy, batch processing and sequential read/write designs that enable high‑throughput data processing. As an event streaming platform it combines message passing, storage and data processing to build a highly scalable, reliable, secure and real‑time infrastructure. From a vehicle networking perspective Kafka provides: stream processing (not just queuing), high throughput, large scale, high availability, long‑term storage and buffering, reprocessable events, and good integration with other enterprise systems.
The combination of Kafka and MQTT is a natural choice for building a scalable, reliable and secure vehicle networking infrastructure; therefore Great Wall’s vehicle networking platform selects Kafka as its core data processing component.
In practice, the MQTT broker cluster is connected to a Kafka cluster: data is first collected from devices via MQTT, then persisted to Kafka for subsequent engine analysis. Even if processing speed lags behind collection speed, data is not lost because it has been persisted to Kafka, enabling continuous monitoring and analysis of vehicle networking device status.
Self‑building Kafka brings increasing R&D and O&M costs: it requires personnel with solid computer fundamentals (familiarity with computer networks, I/O, etc.) and deep understanding of Kafka’s internal principles and configuration parameters for cluster tuning, fault handling and dynamic scaling; it also demands greater manpower and material resources and constant health monitoring of the cluster to promptly troubleshoot and ensure stable business operation; moreover, self‑built message queues often lack scalability and maintainability, and when business message data volume reaches a certain level the self‑built cluster encounters various problems whose resolution poses significant challenges.
Examples of such issues include: cluster exceptions caused by incomplete monitoring metrics and unreasonable log output, making troubleshooting difficult and forcing a business pause and Kafka cluster restart, which heavily impacts service; Kafka cluster expansion is complex, and during business‑peak migrations partition migration can deadlock; maintaining the self‑built cluster’s ZooKeeper is difficult due to high load leading to frequent disconnections.
After communicating with Tencent Cloud’s technical team, CKafka (Cloud Kafka) was identified as a cloud‑based Kafka offering with a complete monitoring and alarm system and an operation ticket system, providing strong advantages in performance, scalability, business security guarantee and O&M, allowing users to enjoy low cost, high performance and rich features while eliminating cumbersome operational work.
The vehicle networking platform utilizes CKafka’s high‑performance, high‑throughput and scalable distributed message queue engine to achieve business decoupling, peak shaving and valley filling, asynchronous data processing, and high business reliability.
In the data reporting scenario, real‑time vehicle data (GPS position, speed, fuel consumption, etc.) is collected, transmitted and distributed via CKafka, enabling a single data stream to serve multiple scenarios.
For real‑time computing, Flink’s Kafka connector processes stream data through Flink operators into the high‑performance columnar database ClickHouse for real‑time updated data analysis, providing exactly‑once processing semantics; CKafka’s multi‑partitions increase throughput and reduce data skew and hotspots. Vehicle fault and abnormal‑behavior status data can be quickly discovered and handled via real‑time analysis.
In the offline analysis part, Flume‑based log collection systems efficiently collect, aggregate and move massive log data from CKafka to HDFS or HBase. When production and processing speeds mismatch, CKafka can act as a buffer; its Partition structure with Append‑only writes gives excellent throughput, while its Replication structure provides high fault tolerance. Offline‑analyzed vehicle data can be used to optimize vehicle performance, improve driving safety and reduce energy consumption.
In the instruction downlink scenario, CKafka receives remote instructions and response results, providing upstream and downstream systems with asynchronous decoupling, peak shaving and valley filling; message persistence and traceability ensure final consistency of instruction status.
Compared with self‑built Kafka, CKafka offers a complete monitoring and alarm system and operation ticket system, with CKafka R&D experts available to answer questions and solve problems quickly, saving effort. CKafka also provides super advantages in performance, scalability, business security guarantee and O&M, letting customers enjoy low cost and super‑functionality while eliminating cumbersome operational work. When CKafka cluster traffic and disk capacity exceed alarm thresholds, the backend transparently expands equipment, solving the long‑standing pain point of data migration during scaling in open‑source Kafka; configuration upgrades are likewise transparent, making it easy to handle business peaks.
Beyond scalability, CKafka supports same‑region custom multi‑AZ deployment and cross‑region disaster recovery, enhancing business disaster‑recovery capability.
Looking ahead, to meet the core demands of reducing storage costs and quickly responding to sudden traffic peaks, CKafka will evolve toward a pay‑as‑you‑go storage model and introduce elastic bandwidth capabilities.
Pay‑as‑you‑go storage bills according to actual used storage space, eliminating the need to reserve storage, providing greater flexibility, easier O&M and lower cost.
Elastic bandwidth offers, on a given bandwidth specification, a certain range of upward flexibility; if burst traffic spikes occur, the cluster will not trigger throttling but will elastically scale within the allowed range, with traffic beyond the original bandwidth billed on a pay‑as‑you‑go basis.
Through reasonable architecture design and flexible product capabilities, CKafka enables users on the cloud to host high‑throughput, high‑availability, easy‑to‑use, O&M‑free Kafka services at lower cost, providing a one‑stop solution for building data flow pipelines. The authors anticipate further cooperation with travel‑industry customers to share more cloud best practices.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.