Big Data 14 min read

Apache Flink at Didi: Platformization, Production Practices, and StreamSQL

This article describes how Didi adopted Apache Flink for its real‑time data streams, detailing the platformized architecture, production use cases such as ETL, monitoring and CEP, the evolution of StreamSQL, and the engineering improvements made to support large‑scale, low‑latency processing.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Apache Flink at Didi: Platformization, Production Practices, and StreamSQL

Many users may not know that the Flink Chinese site is now live at https://zh.ververica.com , and the site is recruiting translators.

Apache Flink at Didi

Didi’s data can be divided into four major categories: trajectory data, transaction data, event‑tracking (埋点) data, and log data, all of which have strong real‑time requirements.

Because of the stringent latency demands of scenarios such as trajectory tracking and gateway log monitoring, Didi evaluated several stream‑processing engines and chose Flink for its pure streaming model, which best matched their needs.

Previously, many business units built their own small Storm or Spark Streaming clusters, leading to high maintenance costs and low resource utilization. Didi therefore unified the streaming layer under a single Flink‑based platform to reduce operational overhead and improve stability.

Benefits of Platformization

Platformization lowers the entry barrier for stream processing, consolidates resources to increase machine utilization, and provides a guaranteed stability service for all business units.

Overall Architecture

The platform consists of four layers: upstream data sources (Kafka and Didi’s internal DDMQ), the core streaming platform (supporting Flink and Spark Streaming, with components such as application management, StreamSQL, WebIDE, and diagnostics), resource management via YARN with storage on HDFS, and downstream sinks (Kafka, DDMQ, HBase, MySQL, etc.).

Engine Improvements

Key enhancements include service‑oriented task submission and control, limiting each YARN session to a single job to simplify resource management, dynamic scaling of jobs without releasing already allocated resources, and fixing numerous Zookeeper‑related bugs that affected checkpoint IDs.

Stream Computing Task Development

Didi offers multiple development modes: WebIDE, local IDE, and the newer StreamSQL IDE, which aims to lower the development threshold for users.

Flink Task Monitoring

The platform provides a real‑time monitoring dashboard displaying latency and throughput metrics, along with customizable alerts and a diagnostic system that aggregates logs into Elasticsearch and stores metric history in Druid.

Production Practices

Didi’s streaming workloads fall into four categories: real‑time ETL, real‑time reporting, real‑time business monitoring, and CEP (complex event processing) online services.

Real‑time Gateway Log Monitoring

Didi collects 90% of its network logs into a Kafka topic, then applies filtering, grouping, and windowed aggregations in Flink to detect anomalies and trigger alerts. Processed data is persisted to Elasticsearch for visualization and to other downstream systems.

StreamSQL

Didi is developing an internal StreamSQL that will eventually handle the majority of its streaming tasks. Its core features include multi‑format DDL support, a single DML statement INSERT INTO TABLE for sinking streams, and advanced operators such as group/window aggregation, joins (including no‑window joins and stream‑dimension joins), and user‑defined functions (UDF, UDTF, UDAF).

StreamSQL also supports multi‑sink (splitting) streams, allowing a single input to be written to both HBase and Kafka, a capability not present in the upstream Flink SQL.

Future plans involve expanding StreamSQL coverage to over 90% of Didi’s streaming jobs, integrating CEP capabilities, and improving dynamic scaling and operator‑level auto‑scaling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataFlinkstream processingDidiStreamSQLPlatformization
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.