Tagged articles
9 articles
Page 1 of 1
Baidu Geek Talk
Baidu Geek Talk
Mar 27, 2023 · Big Data

Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse

The article details Baidu's precise watermark design for its unified streaming‑batch data warehouse, describing how a centralized watermark server and client ensure end‑to‑end data completeness, align real‑time and batch windows with 99.9‑99.99% precision, and support accurate anti‑fraud calculations within the broader big‑data ecosystem.

Apache FlinkBaiduBig Data
0 likes · 14 min read
Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse
Byte Quality Assurance Team
Byte Quality Assurance Team
Jan 6, 2021 · Big Data

Fundamentals of Stream Processing: Bounded vs. Unbounded Data, Time Domains, and Windowing Strategies

This article provides a comprehensive introduction to stream processing fundamentals by distinguishing between bounded and unbounded datasets, clarifying the critical differences between event time and processing time, and exploring various windowing strategies to demonstrate how modern distributed systems efficiently handle continuous data flows.

Apache FlinkData WindowingEvent Time
0 likes · 13 min read
Fundamentals of Stream Processing: Bounded vs. Unbounded Data, Time Domains, and Windowing Strategies
Architect
Architect
Jun 11, 2020 · Big Data

Understanding Apache Flink Architecture, Data Transfer, Event‑Time Processing, State Management, and Checkpointing

This article explains Apache Flink's distributed system architecture—including JobManager, ResourceManager, TaskManager, and Dispatcher—covers session and job deployment modes, data transfer mechanisms, event‑time handling with watermarks, various state types and backends, scaling strategies, and the checkpoint/savepoint recovery process.

Apache FlinkBig DataEvent Time
0 likes · 15 min read
Understanding Apache Flink Architecture, Data Transfer, Event‑Time Processing, State Management, and Checkpointing
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 15, 2020 · Big Data

Understanding Event Time and Watermarks in Apache Flink

This article explains how Apache Flink uses event‑time timestamps and watermarks to handle out‑of‑order and late data, describes the assignTimestampsAndWatermarks API with periodic and punctuated watermark assigners, and provides practical code examples for window lateness and side‑output handling.

Apache FlinkEvent TimeFlink
0 likes · 10 min read
Understanding Event Time and Watermarks in Apache Flink
Big Data Technology Architecture
Big Data Technology Architecture
Aug 7, 2019 · Big Data

Why Choose Apache Flink for Real‑Time Stream Processing: Features and Lessons Learned

This article explains why the author chose Apache Flink for real‑time stream processing, highlighting its unique combination of high throughput, low latency, event‑time support, stateful computation, flexible windows, and fault tolerance, while also reflecting on the challenges of adopting a less‑documented technology.

Event TimeFlinkReal-Time
0 likes · 7 min read
Why Choose Apache Flink for Real‑Time Stream Processing: Features and Lessons Learned
JD Tech Talk
JD Tech Talk
Aug 2, 2018 · Big Data

Real-Time Order Statistics with Apache Flink in a Data Aggregation Platform

This article explains how the data aggregation platform adopts Apache Flink for high‑throughput, low‑latency stream processing, covering the complete workflow from data source integration, transformation operations, windowing and time concepts, to a concrete order‑count example with custom aggregation logic.

Apache FlinkEvent TimeFlink
0 likes · 10 min read
Real-Time Order Statistics with Apache Flink in a Data Aggregation Platform