Databases 15 min read

From Lambda Architecture to an All‑in‑One Apache Doris Real‑Time/Offline Data Platform for 5G Connected Factories

The article explains how China Unicom transformed its 5G fully‑connected factory data pipeline from a complex Lambda architecture into a streamlined, real‑time and offline‑integrated solution built on Apache Doris, detailing system requirements, architectural redesign, performance gains, and future plans.

DataFunTalk
DataFunTalk
DataFunTalk
From Lambda Architecture to an All‑in‑One Apache Doris Real‑Time/Offline Data Platform for 5G Connected Factories

Data is the core element of a 5G fully‑connected factory, requiring efficient collection, storage, and analysis. To simplify the data processing chain, China Unicom evolved its architecture from a traditional Lambda model to an all‑in‑One Apache Doris solution that unifies real‑time and offline workloads.

The new system must support massive data ingestion, including batch imports of historical data, high‑frequency low‑latency writes from sensors, CDC streams, and file‑based imports, while ensuring primary‑key‑based upserts for business data such as orders and personnel.

Query requirements include millisecond‑level point lookups, fast aggregations on partitioned data, sub‑second multi‑dimensional analysis, and complex multi‑table joins, all while maintaining stable performance for reporting dashboards.

Initially, the factory used a Lambda architecture with Hive for offline processing and ClickHouse for real‑time queries, leading to long data pipelines, accuracy issues, high maintenance costs, and limited scalability. After evaluating Apache Doris, ClickHouse, Hive, and data‑lake options, Doris was selected for its simplicity, MySQL compatibility, high‑performance OLAP capabilities, and low operational overhead.

The Doris‑based architecture eliminates Hive, ClickHouse, and HBase, allowing both batch and streaming data to be written directly to Doris via Flink CDC, Kafka, and JDBC Catalog. Real‑time data is synchronized with Flink Doris Connector, while less time‑critical data is ingested through scheduled JDBC pulls.

Key benefits include: unified real‑time/offline data flow, orders‑of‑magnitude improvements in query latency and throughput, a simplified federated query gateway, and dramatically reduced operational costs thanks to Doris’s lightweight FE+BE design and Doris Manager for automated management.

Future work focuses on scaling Doris Manager, further query optimizations, adopting multi‑table materialized views, exploring compute‑storage separation in Doris 3.0, and building a standardized metric system for the 5G factory scenario.

real-time analyticsData Warehouse5GApache Dorisindustrial IoT
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.