Big Data 17 min read

Overview of Real-Time Big Data Processing: Spark Structured Streaming, CarbonData, Flink, and Cloud Stream

This article provides a comprehensive overview of modern real‑time big‑data solutions, detailing Spark Structured Streaming capabilities, CarbonData’s storage architecture, Meituan’s Flink deployments, and Huawei Cloud Stream’s unified streaming service, highlighting their features, challenges, and future directions.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Overview of Real-Time Big Data Processing: Spark Structured Streaming, CarbonData, Flink, and Cloud Stream

The demand for millisecond‑to‑second real‑time analytics is rising across enterprises, prompting the adoption of high‑performance big‑data platforms such as Spark Structured Streaming, CarbonData, and Flink.

Spark Structured Streaming offers unified APIs, fault‑tolerant checkpointing, micro‑batch and continuous processing, support for multiple sources (Kafka, Files, Kinesis), and exactly‑once semantics, making it suitable for complex event‑time workloads.

CarbonData is a high‑performance columnar storage solution that combines fast filtering, multi‑level indexing, dictionary encoding, and pre‑aggregation to enable trillion‑row analytics with sub‑second response times. It integrates tightly with SparkSQL, supports segment‑level management, and provides DataMap indexes for accelerated queries.

Flink at Meituan powers a massive real‑time platform handling trillions of events daily. The architecture leverages Kafka for ingestion, Flink for low‑latency processing, and a layered resource‑isolation strategy on YARN. Key practices include HA JobManager deployment, checkpoint‑based recovery, multi‑zone redundancy, and custom retry logic for Kafka I/O.

Meituan’s use cases span the Petra real‑time metric aggregation system and the MLX machine‑learning platform, both of which exploit Flink’s event‑time processing, state management, and exactly‑once guarantees.

Comparison and Cloud Stream – Huawei’s Cloud Stream Service unifies Flink and Spark (Streaming & Structured Streaming) under a serverless, multi‑tenant model. It addresses limitations of Spark Streaming, offers visual StreamSQL editing, supports a rich ecosystem (Kafka, Hadoop, Elasticsearch, etc.), and delivers millisecond‑level latency with million‑message‑per‑second throughput.

Overall, the article highlights the evolution of streaming technologies, their architectural trade‑offs, and emerging trends such as unified APIs, state‑ful processing, and cloud‑native deployment models.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Flinkstream processingReal-time analyticsSparkCarbonData
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.