Tag

stream processing

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
Jun 3, 2025 · Big Data

BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing

BiFang is a lake‑stream integrated storage engine that merges Apache Pulsar message‑queue capabilities with Iceberg data‑lake features, providing a single unified data store with full‑incremental queries, sub‑second visibility, exactly‑once semantics, and seamless integration with Flink, Spark, and StarRocks for both real‑time analytics and batch processing.

Apache IcebergApache PulsarBig Data
0 likes · 13 min read
BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing
Full-Stack Internet Architecture
Full-Stack Internet Architecture
May 27, 2025 · Big Data

Understanding Event Streaming in Kafka: Core Concepts, Architecture, and Use Cases

This article explains Kafka's event streaming concept, detailing events and streams, core components such as producers, topics, partitions, consumers, persistence, and typical real‑time data pipeline, event‑driven architecture, stream processing, and log aggregation use cases, highlighting its role as a foundational big‑data infrastructure.

Big DataEvent StreamingKafka
0 likes · 7 min read
Understanding Event Streaming in Kafka: Core Concepts, Architecture, and Use Cases
ByteDance Data Platform
ByteDance Data Platform
Apr 25, 2025 · Databases

How ByteDance’s AQETuner Cuts Query Latency by 23% and Boosts Reliability

ByteDance Data Platform’s recent breakthroughs in database research—spanning query‑level Bayesian tuning, adaptive stream‑processing parallelism, and learned cardinality estimation—were highlighted by two papers accepted at VLDB 2025 and ICDE 2025, showcasing significant performance gains and real‑world deployments.

AIQuery Optimizationcardinality estimation
0 likes · 5 min read
How ByteDance’s AQETuner Cuts Query Latency by 23% and Boosts Reliability
Big Data Technology Architecture
Big Data Technology Architecture
Mar 1, 2025 · Big Data

Core Principles and Practical Guide to Flink CDC

This article explains CDC fundamentals, details Flink CDC's architecture and advantages, provides setup steps, code examples for SQL and DataStream APIs, discusses performance tuning, consistency, common issues, and typical real‑time data integration scenarios.

CDCChange Data CaptureDebezium
0 likes · 7 min read
Core Principles and Practical Guide to Flink CDC
DaTaobao Tech
DaTaobao Tech
Dec 18, 2024 · Big Data

Incremental Computation in Big Data: Flink Materialized Table and Paimon

The article explains how Flink 1.20’s Materialized Table combined with Paimon’s changelog storage enables incremental computation that unifies batch and streaming workloads, delivering minute‑level latency at lower cost, illustrated by a materialized‑table example while noting current streaming‑only support and future batch extensions.

Big DataPaimonflink
0 likes · 13 min read
Incremental Computation in Big Data: Flink Materialized Table and Paimon
DaTaobao Tech
DaTaobao Tech
Oct 25, 2024 · Big Data

Using Temporary Table JOIN in Flink SQL for Real-Time Stream Enrichment

The article explains how to use Flink SQL’s temporary table join to enrich a real‑time traffic‑log stream with versioned tag data, detailing the required DDL, the time‑versioned join syntax, and essential watermark and idle‑timeout settings that prevent stalls and boundary‑delay issues.

SQLTemporary JoinVersioned Table
0 likes · 7 min read
Using Temporary Table JOIN in Flink SQL for Real-Time Stream Enrichment
JD Retail Technology
JD Retail Technology
Sep 25, 2024 · Big Data

From a Personal Journey to Data Platform Architecture: Insights on Big Data, Cloud Computing, and System Design

The article narrates the author’s 30‑year programming career and shares technical reflections on building business‑agnostic, configurable data platforms, covering batch, streaming, interactive computing, big‑data sharding, Spark, Flink, cloud migration, and the philosophy of software architecture.

Big DataData Engineeringbatch processing
0 likes · 23 min read
From a Personal Journey to Data Platform Architecture: Insights on Big Data, Cloud Computing, and System Design
ZhongAn Tech Team
ZhongAn Tech Team
Sep 3, 2024 · Big Data

Real-Time Log Clustering Architecture and Continuous Clustering Algorithm

This article presents a comprehensive overview of a log clustering system, detailing its background, architecture based on Filebeat, Kafka, Flink, Elasticsearch, and Grafana, and introduces a continuous clustering algorithm using SimHash and Hamming distance for real‑time log governance and anomaly detection.

Log ClusteringReal-time AnalyticsSimhash
0 likes · 14 min read
Real-Time Log Clustering Architecture and Continuous Clustering Algorithm
Mike Chen's Internet Architecture
Mike Chen's Internet Architecture
Aug 16, 2024 · Big Data

Understanding the Lambda Architecture for Big Data Processing

This article explains the Lambda architecture—a three‑layer model combining batch and real‑time processing for large‑scale data, outlines its components, advantages, disadvantages, common tools, and compares it with the Kappa alternative while providing practical insights for data engineers.

Big DataData Engineeringbatch processing
0 likes · 5 min read
Understanding the Lambda Architecture for Big Data Processing
DataFunSummit
DataFunSummit
Aug 7, 2024 · Big Data

Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, detailing its architecture, data quality assurance, stream‑batch integration, and future data lake implementation, while highlighting the use of Flink, ODPS, and Paimon for scalable, low‑latency analytics.

Big DataData LakeData Warehouse
0 likes · 15 min read
Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook
DataFunTalk
DataFunTalk
Jul 18, 2024 · Big Data

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent exploration of real-time data warehouse architecture, covering its six-module design, data quality assurance mechanisms, stream‑batch unified processing with Flink and ODPS, and a forward‑looking data lake solution built on Paimon, offering practical insights for large‑scale streaming analytics.

Big DataData Lakedata quality
0 likes · 15 min read
Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook
Baidu Tech Salon
Baidu Tech Salon
Jun 18, 2024 · Big Data

Scalable, High‑Accuracy Event Logging Monitoring for Baidu's Log Platform

Baidu’s log platform processes billions of daily page‑view events and, to monitor them accurately with minute‑level latency, implements a downstream streaming‑task architecture that maps limited custom dimensions, uses watermarks for completeness, trims raw data, aggregates into 5‑minute windows, and outputs concise metrics to Elasticsearch, achieving high accuracy, configurability, and low cost.

Big DataReal-time AnalyticsUBC
0 likes · 11 min read
Scalable, High‑Accuracy Event Logging Monitoring for Baidu's Log Platform
DataFunSummit
DataFunSummit
Apr 8, 2024 · Big Data

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, covering its modular architecture, data quality assurance mechanisms, stream‑batch integration techniques, graph‑based conversion attribution, and future data‑lake implementation using Paimon.

Big DataData Lakedata quality
0 likes · 15 min read
Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook
DataFunTalk
DataFunTalk
Dec 27, 2023 · Big Data

Apache Flink 2023: Core Technical Achievements and Future Directions

The article reviews Apache Flink's rapid development over the past decade, highlighting its 2023 community growth, SIGMOD award, major releases, streaming SQL enhancements, incremental checkpointing, batch maturity, cloud‑native scaling, and integration with the emerging Lakehouse architecture.

Apache FlinkBig DataCheckpoint
0 likes · 11 min read
Apache Flink 2023: Core Technical Achievements and Future Directions
Efficient Ops
Efficient Ops
Sep 24, 2023 · Information Security

How China Postal Savings Bank Built an Enterprise‑Level AI‑Powered Anti‑Fraud Platform

The 2023 China International Service Trade Fair’s Digital Transformation Forum showcased the Postal Savings Bank’s enterprise‑grade intelligent anti‑fraud platform, detailing its stream‑batch integration, graph‑based AI models, and multi‑layer risk‑control architecture that safeguards millions of daily transactions across retail, agricultural, and credit services.

China Postal Savings BankDigital Transformationanti-fraud
0 likes · 8 min read
How China Postal Savings Bank Built an Enterprise‑Level AI‑Powered Anti‑Fraud Platform
Didi Tech
Didi Tech
Jun 14, 2023 · Big Data

Real-Time Data Development Practices and Component Selection at Didi

Didi’s unified real‑time data stack outlines best‑practice component choices for four key scenarios—metric monitoring, BI analysis, online services, and feature/tag systems—detailing pipelines from source to sink, resource‑usage guidelines, and a one‑stop development platform to build stable, high‑performance streaming solutions.

Big DataClickHouseDruid
0 likes · 17 min read
Real-Time Data Development Practices and Component Selection at Didi
Architects Research Society
Architects Research Society
Apr 18, 2023 · Backend Development

Event Sourcing, CQRS, and Stream Processing with Apache Kafka

Event sourcing models state changes as immutable logs, and when combined with CQRS and Kafka Streams, it enables scalable, fault‑tolerant architectures where write and read paths are decoupled, supporting local or external state stores, interactive queries, and zero‑downtime upgrades.

CQRSEvent SourcingKafka Streams
0 likes · 21 min read
Event Sourcing, CQRS, and Stream Processing with Apache Kafka
Baidu Geek Talk
Baidu Geek Talk
Mar 27, 2023 · Big Data

Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse

The article details Baidu's precise watermark design for its unified streaming‑batch data warehouse, describing how a centralized watermark server and client ensure end‑to‑end data completeness, align real‑time and batch windows with 99.9‑99.99% precision, and support accurate anti‑fraud calculations within the broader big‑data ecosystem.

Apache FlinkBaiduBig Data
0 likes · 14 min read
Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse
Architects Research Society
Architects Research Society
Mar 15, 2023 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Considerations

This article explains why exactly‑once semantics are needed for stream‑processing applications, describes Kafka's transactional model and semantics, details the Java transaction API and its usage, and discusses the internal components, performance trade‑offs, and practical guidelines for building reliable Kafka‑based pipelines.

Distributed SystemsJavaKafka
0 likes · 17 min read
Understanding Transactions in Apache Kafka: Semantics, API, and Practical Considerations