Tag

Batch-Stream Integration

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
Aug 26, 2023 · Big Data

Bilibili's Practice of Building a Streaming Data Lake with Hudi and Flink

This article details Bilibili's implementation of a streaming data lake using Hudi and Flink, covering background challenges, four case studies, batch‑stream integration optimizations, infrastructure and kernel enhancements, and future work directions.

Batch-Stream IntegrationBig DataFlink
0 likes · 14 min read
Bilibili's Practice of Building a Streaming Data Lake with Hudi and Flink
DataFunTalk
DataFunTalk
Jun 6, 2021 · Big Data

Understanding Apache Pulsar: Cloud‑Native Messaging, Storage‑Compute Separation, and Batch‑Stream Fusion with Flink

This article explains Apache Pulsar’s cloud‑native, storage‑compute separated architecture, its data model and scalability features, and how it integrates with Flink to provide a unified platform for both real‑time streaming and batch processing in big‑data applications.

Apache PulsarBatch-Stream IntegrationBig Data
0 likes · 17 min read
Understanding Apache Pulsar: Cloud‑Native Messaging, Storage‑Compute Separation, and Batch‑Stream Fusion with Flink
Big Data Technology Architecture
Big Data Technology Architecture
Apr 5, 2021 · Big Data

Evolution of Real‑Time Data Warehouses: From 1.0 to 3.0 and the Road to Batch‑Stream Unified Architecture

The article reviews the current state of offline Hive‑based data warehouses, explains the emergence of real‑time data warehouses (1.0) built on Kafka and Flink, discusses their limitations, and outlines the progression toward batch‑stream unified architectures (2.0 and 3.0) leveraging data‑lake technologies such as Iceberg.

Batch-Stream IntegrationBig DataFlink
0 likes · 13 min read
Evolution of Real‑Time Data Warehouses: From 1.0 to 3.0 and the Road to Batch‑Stream Unified Architecture
TAL Education Technology
TAL Education Technology
Jan 28, 2021 · Big Data

Batch-Stream Fusion in Education: TAL’s Real-Time Data Platform Practices

This article, presented by senior data platform engineer Mao Xiangyi of TAL Education, details the design and implementation of the company’s real‑time T‑Streaming platform, covering its three‑layer data architecture, batch‑stream integration techniques, ODS layer real‑timeization, Flink SQL development workflow, hybrid‑cloud deployment, and a case study of K‑12 renewal reporting.

Batch-Stream IntegrationData EngineeringFlink
0 likes · 18 min read
Batch-Stream Fusion in Education: TAL’s Real-Time Data Platform Practices
DataFunTalk
DataFunTalk
Dec 7, 2020 · Big Data

Jingdong's Flink Real‑Time Computing Platform: Containerization, Optimizations, and Future Roadmap

This article details Jingdong's evolution from Storm to Flink, the architecture of its Kubernetes‑based real‑time computing platform, extensive containerization practices, performance and stability optimizations, and the future plan to unify batch‑stream processing while expanding SQL support and intelligent operations.

Batch-Stream IntegrationBig DataContainerization
0 likes · 16 min read
Jingdong's Flink Real‑Time Computing Platform: Containerization, Optimizations, and Future Roadmap