Big Data 18 min read

How Real-Time Flink Powers Automotive Big Data: Architecture & Case Studies

This article, based on Alibaba Cloud expert Li Lubing’s presentation, examines the rapid growth of China’s new energy vehicle market, outlines typical automotive big‑data architectures, compares Lambda and real‑time lakehouse solutions built with Flink and Apache Paimon, and showcases real‑world customer deployments.

Alibaba Cloud Big Data AI Platform

Oct 25, 2024

How Real-Time Flink Powers Automotive Big Data: Architecture & Case Studies

Abstract

This article is compiled from Alibaba Cloud product expert Li Lubing’s sharing on the real‑time computing Flink product. It focuses on automotive industry scenarios and is divided into four parts: market trend insight, typical big‑data architecture analysis, product market position and capability interpretation, and typical customer case studies.

Insight: Market Trend (Automotive)

The new‑energy vehicle market in China is growing rapidly, with an expected 13 million vehicles online by 2025 and a compound annual growth rate of 35.1% from 2022 to 2026. This surge demands high‑performance, cost‑effective real‑time big‑data systems to support digital and intelligent transformation.

Real‑time data is a key focus, with online data collection being the largest scenario.

Sales/Operations: Store traffic monitoring, metric tracking, user profiling, customer satisfaction, after‑sale maintenance, supply‑chain management, and other retail‑like data applications.

Vehicle‑IoT: Utilizes vehicle sensor and location data for predictive maintenance, remote diagnostics, location‑based services, vehicle statistics, OTA updates, etc.

Autonomous Driving: Includes assisted driving, high‑precision maps, safety alerts, and related applications.

Automotive big‑data characteristics: massive volume, low data‑value density, and obvious peak‑valley patterns, requiring real‑time, low‑cost processing.

Typical Customer Big Data Architecture (Automotive)

The reference architecture consists of four layers:

Data Ingestion Layer: Primarily vehicle‑embedded terminals and IoT devices generate massive data, supplemented by data from production, R&D, and supply‑chain systems.

Data Layer: Raw data is processed into domain‑specific topics such as user data, vehicle data, and power‑train data.

Application Layer: Supports mobile, PC, and large‑screen applications across sales, finance, R&D, quality, and supply‑chain scenarios, with a standard layer for data governance.

Standard Layer: Defines data strategy, architecture, security, quality, standards, lifecycle, metrics, and governance.

Typical Technical Solutions

Two typical architectures are presented:

1. Lambda Architecture

Separate offline and real‑time processing pipelines. Offline uses MaxCompute; real‑time uses Flink + Hologres. The two chains operate independently without data sharing.

2. Real‑time Lakehouse Architecture

Core is Flink streaming engine combined with Apache Paimon unified storage. OSS provides low‑cost object storage; query engines such as StarRocks or Hologres deliver fast analytics. This architecture unifies stream‑batch processing, reduces cost, and improves data freshness.

Product Market and Capability Interpretation

Alibaba Cloud is recognized as a leader in the IDC MarketScape China Real‑time Lakehouse 2024 evaluation.

Lakehouse architecture has become mainstream, offering openness, compatibility with streaming, batch, and OLAP workloads, and delivering real‑time freshness. Future focus includes metadata management, data security, and quality governance.

Real‑time Process

Key trends: public cloud adoption, real‑time processing, and AI integration. The presentation emphasizes Flink for real‑time computing.

Data architecture evolution stages:

Introduction of data warehouses and data lakes based on HDFS.

Adoption of lakehouse solutions (e.g., Hudi, Iceberg) primarily for batch workloads.

3.0 era with native real‑time and AI support using Apache Paimon and Flink.

Overall Solution

Flink is the core engine; data is stored in Apache Paimon tables on OSS. Downstream analytics use StarRocks or Hologres. The solution offers low cost, full‑link real‑time, and unified stream‑batch storage and compute, supporting multiple engines.

Implementation: Ingestion to Lake and Warehouse

(1) Simplified operations: CTAS for table merging, CDAS for whole‑library sync, and SQL scripts for ad‑hoc queries.

(2) Schema evolution support for upstream table changes.

(3) Supports various processing operations such as Select, Where, Group by, Join, Top‑N, Insert.

Low‑Cost Stream‑Batch Storage

OSS‑based Paimon tables provide low‑cost, high‑performance storage using LSM‑Tree. The Changelog mechanism enables update and delete operations, supporting both stream and batch compute. Columnar storage and compression further improve efficiency.

Alibaba Cloud Real‑time Flink Product Capabilities

(1) Data ingestion: Flink CDC handles full and incremental data, supports over thirty built‑in connectors, and plans a YAML‑based development model.

(2) Task development & scheduling: Supports stream‑batch, multi‑language, CEP, unified catalog, environment isolation, test data generation, temporary queries, and external integration.

(3) Operations: Batch scheduling, workflow management, data lineage, intelligent diagnosis, auto‑tuning, resource queue, state and variable management.

Typical Customer Cases (Automotive)

Real‑time lakehouse built with Flink and Apache Paimon on OSS, compute via Flink or EMR‑Spark, analytics via StarRocks. The architecture delivers full‑link real‑time flow, unified stream‑batch processing, and cost‑effective performance.

Appendix: Vehicle Data Types

Reference list includes whole‑vehicle data, drive‑motor data, fuel‑cell data, and other sensor categories.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Cloud Computing Real-time Processing Flink Lakehouse Automotive

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.