Tagged articles

stream processing

254 articles · Page 1 of 3

Apr 17, 2026 · Industry Insights

The 30‑Year Journey: From Parallel Computing to Modern GPU‑Powered AI

This article traces three decades of government‑funded research in parallel computing, graphics systems, and stream processing, showing how those advances migrated to companies like Nvidia, evolved into CUDA and other GPU technologies, and ultimately enabled today’s AI revolution.

AICUDAGPU computing

0 likes · 18 min read

The 30‑Year Journey: From Parallel Computing to Modern GPU‑Powered AI

ByteDance Data Platform

Feb 2, 2026 · Big Data

How StreamShield Powers Production‑Grade Resilience for Apache Flink at Massive Scale

ByteDance’s StreamShield delivers a three‑layer resiliency framework—engine self‑healing, hybrid replication at the cluster level, and chaos‑tested releases—that enables over 70,000 concurrent Flink jobs on 11 million CPU cores to meet strict SLAs with second‑level startup and robust fault tolerance.

Apache FlinkByteDanceReal-Time Computing

0 likes · 6 min read

How StreamShield Powers Production‑Grade Resilience for Apache Flink at Massive Scale

iQIYI Technical Product Team

Jan 8, 2026 · Big Data

How iQIYI Cut Stream Data Costs by 70%: From Private‑Cloud Kafka to AutoMQ

This article details iQIYI's evolution from a tightly coupled private‑cloud Kafka setup to a cloud‑native AutoMQ architecture, describing the challenges of scaling, the development of the Stream platform and Stream‑SDK, the migration to hybrid and public‑cloud Kafka, and the resulting cost and elasticity improvements.

AutoMQCloud MigrationData Architecture

0 likes · 12 min read

How iQIYI Cut Stream Data Costs by 70%: From Private‑Cloud Kafka to AutoMQ

Baidu Geek Talk

Sep 24, 2025 · Big Data

How Feed Real‑Time Data Warehouse Was Re‑Engineered for Speed and Cost Savings

This article explains how Baidu’s Feed real‑time data warehouse was rebuilt using a pure streaming architecture, detailing the limitations of the previous stream‑batch design, the technical solutions—including core/non‑core data separation, metric calculation in streaming, and Parquet storage with Apache Arrow—and the resulting cost reductions, latency improvements, and future roadmap.

Apache ArrowBatch ProcessingParquet

0 likes · 17 min read

How Feed Real‑Time Data Warehouse Was Re‑Engineered for Speed and Cost Savings

Baidu Geek Talk

Sep 1, 2025 · Big Data

How Baidu Netdisk Built a High‑Performance Real‑Time Engine with Flink

This article explains how Baidu Netdisk transitioned from Spark Streaming to a Flink‑based Tiangong real‑time computing engine, detailing the evolution, reasons for choosing Flink, architecture, configuration examples, business use cases, technical challenges, and future platform plans.

Baidu NetdiskBig DataFlink

0 likes · 16 min read

How Baidu Netdisk Built a High‑Performance Real‑Time Engine with Flink

Big Data Technology & Architecture

Jul 23, 2025 · Big Data

What’s New in Apache Flink 2.0? Key Features and Cloud‑Native Upgrades for 2025

This article summarizes the major Apache Flink 2.0 updates released in the first half of 2025, covering architecture separation, cloud‑native deployment, AI‑driven agents, SQL enhancements, data integration, operational tools, and performance optimizations for real‑time intelligent computing.

AI integrationBig DataCloud Native

0 likes · 10 min read

What’s New in Apache Flink 2.0? Key Features and Cloud‑Native Upgrades for 2025

DataFunSummit

Jun 18, 2025 · Big Data

How Real‑Time Lakehouse and Apache Paimon Transform Modern Data Architecture

This article explains the concept of a real‑time lakehouse, compares it with traditional batch warehouses, introduces Apache Paimon and its innovations such as native upserts, LSM storage, tags and branches, and showcases multiple enterprise use cases that demonstrate its low‑cost, low‑latency stream‑batch integration.

Apache PaimonData Lakereal-time lakehouse

0 likes · 17 min read

How Real‑Time Lakehouse and Apache Paimon Transform Modern Data Architecture

DataFunSummit

Jun 3, 2025 · Big Data

BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing

BiFang is a lake‑stream integrated storage engine that merges Apache Pulsar message‑queue capabilities with Iceberg data‑lake features, providing a single unified data store with full‑incremental queries, sub‑second visibility, exactly‑once semantics, and seamless integration with Flink, Spark, and StarRocks for both real‑time analytics and batch processing.

Apache IcebergApache PulsarLakehouse

0 likes · 13 min read

BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing

Big Data Tech Team

Jun 2, 2025 · Big Data

Master Apache Flink: A Complete Learning Roadmap from Basics to Advanced Projects

This guide outlines a comprehensive Apache Flink learning path, covering prerequisite knowledge, core concepts, APIs, state management, performance tuning, hands‑on projects, advanced topics like SQL optimization and Kubernetes deployment, plus curated resources and community tips to help beginners and intermediate users become proficient.

Apache FlinkFlink Tutoriallearning roadmap

0 likes · 8 min read

Master Apache Flink: A Complete Learning Roadmap from Basics to Advanced Projects

Full-Stack Internet Architecture

May 27, 2025 · Big Data

Understanding Event Streaming in Kafka: Core Concepts, Architecture, and Use Cases

This article explains Kafka's event streaming concept, detailing events and streams, core components such as producers, topics, partitions, consumers, persistence, and typical real‑time data pipeline, event‑driven architecture, stream processing, and log aggregation use cases, highlighting its role as a foundational big‑data infrastructure.

Real-time Processingevent streamingkafka

0 likes · 7 min read

Understanding Event Streaming in Kafka: Core Concepts, Architecture, and Use Cases

Tencent Cloud Developer

May 8, 2025 · Big Data

How Setats Unifies Stream, Batch, and Incremental Processing for Real‑Time Data Lakes

At the 2025 DA Data+AI Conference in Shanghai, Tencent Cloud unveiled Setats—a unified stream‑batch‑incremental engine that cuts system costs, delivers second‑level data visibility and real‑time changelog generation, and demonstrates measurable performance gains in automotive IoT analytics while integrating tightly with the WeData platform.

Batch ProcessingBig Data ArchitectureData Lake

0 likes · 5 min read

How Setats Unifies Stream, Batch, and Incremental Processing for Real‑Time Data Lakes

ByteDance Data Platform

Apr 25, 2025 · Databases

How ByteDance’s AQETuner Cuts Query Latency by 23% and Boosts Reliability

ByteDance Data Platform’s recent breakthroughs in database research—spanning query‑level Bayesian tuning, adaptive stream‑processing parallelism, and learned cardinality estimation—were highlighted by two papers accepted at VLDB 2025 and ICDE 2025, showcasing significant performance gains and real‑world deployments.

AIDatabasesQuery Optimization

0 likes · 5 min read

How ByteDance’s AQETuner Cuts Query Latency by 23% and Boosts Reliability

Big Data Technology Architecture

Mar 1, 2025 · Big Data

Core Principles and Practical Guide to Flink CDC

This article explains CDC fundamentals, details Flink CDC's architecture and advantages, provides setup steps, code examples for SQL and DataStream APIs, discusses performance tuning, consistency, common issues, and typical real‑time data integration scenarios.

CDCChange Data CaptureDebezium

0 likes · 7 min read

Core Principles and Practical Guide to Flink CDC

Alibaba Cloud Big Data AI Platform

Feb 20, 2025 · Big Data

How Flink Powers Real-Time Variable Pools for FinTech Risk Assessment

This article details how a fintech company leveraged Apache Flink to build a real-time variable pool, covering architecture choices, development efficiency improvements, multi‑stream association optimizations, and operational monitoring, while also discussing future migration to cloud‑native OLAP solutions.

Big DataFinTechFlink

0 likes · 10 min read

How Flink Powers Real-Time Variable Pools for FinTech Risk Assessment

DaTaobao Tech

Dec 18, 2024 · Big Data

Incremental Computation in Big Data: Flink Materialized Table and Paimon

The article explains how Flink 1.20’s Materialized Table combined with Paimon’s changelog storage enables incremental computation that unifies batch and streaming workloads, delivering minute‑level latency at lower cost, illustrated by a materialized‑table example while noting current streaming‑only support and future batch extensions.

Big DataFlinkIncremental Computation

0 likes · 13 min read

Incremental Computation in Big Data: Flink Materialized Table and Paimon

Big Data Technology & Architecture

Dec 18, 2024 · Big Data

Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse

The article reviews the major directions of Flink 2.0—including compute‑storage separation, a new Materialized Table for unified batch‑stream processing, and deeper integration with Paimon for streaming warehouses—while offering a cautious perspective on their practical impact and migration challenges.

Batch-Stream IntegrationBig DataCompute-Storage Separation

0 likes · 5 min read

Key Trends of Flink 2.0: Compute‑Storage Separation, Unified Batch‑Stream, and Streaming Warehouse

Big Data Technology & Architecture

Dec 9, 2024 · Big Data

Understanding Flink’s Exactly-Once Semantics and Its Relation to Deduplication

This article explains what Flink’s Exactly‑Once semantics actually guarantee, why it does not mean each event is processed only once, how checkpointing and two‑phase commit sinks enable end‑to‑end exactly‑once, and the three safeguards needed for true exactly‑once computation.

Big DataDeduplicationExactly-once

0 likes · 5 min read

Understanding Flink’s Exactly-Once Semantics and Its Relation to Deduplication

Alibaba Cloud Big Data AI Platform

Nov 29, 2024 · Big Data

How Fluss Redefines Real‑Time Stream Storage for Flink

Fluss, an open‑source real‑time stream storage project from Alibaba, integrates columnar formats and low‑latency updates with Apache Flink to address the limitations of traditional Kafka‑Flink pipelines, offering high throughput, low cost, and seamless lakehouse support for modern data analytics.

Apache FlinkFlussreal-time storage

0 likes · 6 min read

How Fluss Redefines Real‑Time Stream Storage for Flink

JD Cloud Developers

Nov 5, 2024 · Big Data

Zero‑Code Flink: Build StreamGraph, JobGraph & ExecutionGraph via Canvas DAG

This article explains how Flink applications are transformed through StreamGraph, JobGraph, and ExecutionGraph stages, and presents a low‑code canvas approach that lets users assemble DAGs, persist them in a MySQL adjacency list, and generate zero‑code Flink programs using BFS traversal.

DAGFlinkLow-Code Development

0 likes · 6 min read

Zero‑Code Flink: Build StreamGraph, JobGraph & ExecutionGraph via Canvas DAG

Big Data Technology & Architecture

Oct 31, 2024 · Big Data

Understanding Paimon's Changelog Producer: Four Modes and Their Trade‑offs

The article explains Paimon's changelog‑producer capability, detailing its purpose, storage format, and the four generation modes—None, Input, Lookup, and Full Compaction—while comparing their costs, implementation details, and suitability for different data sources such as CDC.

@LookupBig DataCompaction

0 likes · 16 min read

Understanding Paimon's Changelog Producer: Four Modes and Their Trade‑offs

DaTaobao Tech

Oct 25, 2024 · Big Data

Using Temporary Table JOIN in Flink SQL for Real-Time Stream Enrichment

The article explains how to use Flink SQL’s temporary table join to enrich a real‑time traffic‑log stream with versioned tag data, detailing the required DDL, the time‑versioned join syntax, and essential watermark and idle‑timeout settings that prevent stalls and boundary‑delay issues.

FlinkSQLTemporary Join

0 likes · 7 min read

Using Temporary Table JOIN in Flink SQL for Real-Time Stream Enrichment

JD Retail Technology

Sep 25, 2024 · Big Data

From a Personal Journey to Data Platform Architecture: Insights on Big Data, Cloud Computing, and System Design

The article narrates the author’s 30‑year programming career and shares technical reflections on building business‑agnostic, configurable data platforms, covering batch, streaming, interactive computing, big‑data sharding, Spark, Flink, cloud migration, and the philosophy of software architecture.

Batch ProcessingCloud ComputingData Engineering

0 likes · 23 min read

From a Personal Journey to Data Platform Architecture: Insights on Big Data, Cloud Computing, and System Design

ZhongAn Tech Team

Sep 3, 2024 · Big Data

Real-Time Log Clustering Architecture and Continuous Clustering Algorithm

This article presents a comprehensive overview of a log clustering system, detailing its background, architecture based on Filebeat, Kafka, Flink, Elasticsearch, and Grafana, and introduces a continuous clustering algorithm using SimHash and Hamming distance for real‑time log governance and anomaly detection.

FlinkLog ClusteringSimHash

0 likes · 14 min read

Real-Time Log Clustering Architecture and Continuous Clustering Algorithm

Mike Chen's Internet Architecture

Aug 16, 2024 · Big Data

Understanding the Lambda Architecture for Big Data Processing

This article explains the Lambda architecture—a three‑layer model combining batch and real‑time processing for large‑scale data, outlines its components, advantages, disadvantages, common tools, and compares it with the Kappa alternative while providing practical insights for data engineers.

Batch ProcessingBig DataData Engineering

0 likes · 5 min read

Understanding the Lambda Architecture for Big Data Processing

DataFunSummit

Aug 7, 2024 · Big Data

Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, detailing its architecture, data quality assurance, stream‑batch integration, and future data lake implementation, while highlighting the use of Flink, ODPS, and Paimon for scalable, low‑latency analytics.

Data QualityFlinkReal-time Data

0 likes · 15 min read

Ant Group Real-Time Data Warehouse: Architecture, Solutions, and Data Lake Outlook

JD Tech Talk

Aug 6, 2024 · Big Data

Real-Time Stream Computation in Monitoring Systems: Data Streams, Windows, and Watermarks with Apache Flink

This article explains the role of monitoring systems, introduces real-time data stream computation, describes data stream characteristics, details Flink’s event time and processing time concepts, various window types, watermark mechanisms, and strategies for handling out-of-order and late data.

FlinkReal-timeWindow

0 likes · 18 min read

Real-Time Stream Computation in Monitoring Systems: Data Streams, Windows, and Watermarks with Apache Flink

JD Cloud Developers

Aug 6, 2024 · Big Data

Master Real-Time Stream Processing with Flink: Windows & Watermarks

This article provides a comprehensive overview of real-time stream processing, covering data streams, window types, event and processing time, Flink's operator model, watermark mechanisms, and strategies for handling out-of-order and late data to ensure accurate, timely analytics.

FlinkWatermarksWindowing

0 likes · 15 min read

Master Real-Time Stream Processing with Flink: Windows & Watermarks

Big Data Technology & Architecture

Aug 5, 2024 · Big Data

Key Features of Apache Flink 1.20: Materialized Tables, DISTRIBUTED BY, and State/Checkpoint Optimizations

The article reviews Apache Flink 1.20, highlighting the new Materialized Table concept, the DISTRIBUTED BY support for load‑balanced storage and join performance, and state/checkpoint file merging improvements, while providing code examples and practical insights for users.

Apache FlinkBig DataCheckpoint Optimization

0 likes · 7 min read

Key Features of Apache Flink 1.20: Materialized Tables, DISTRIBUTED BY, and State/Checkpoint Optimizations

DataFunTalk

Jul 18, 2024 · Big Data

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent exploration of real-time data warehouse architecture, covering its six-module design, data quality assurance mechanisms, stream‑batch unified processing with Flink and ODPS, and a forward‑looking data lake solution built on Paimon, offering practical insights for large‑scale streaming analytics.

Flinkstream processing

0 likes · 15 min read

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

Baidu Tech Salon

Jun 18, 2024 · Big Data

Scalable, High‑Accuracy Event Logging Monitoring for Baidu's Log Platform

Baidu’s log platform processes billions of daily page‑view events and, to monitor them accurately with minute‑level latency, implements a downstream streaming‑task architecture that maps limited custom dimensions, uses watermarks for completeness, trims raw data, aggregates into 5‑minute windows, and outputs concise metrics to Elasticsearch, achieving high accuracy, configurability, and low cost.

Log MonitoringUBCdimension mapping

0 likes · 11 min read

Scalable, High‑Accuracy Event Logging Monitoring for Baidu's Log Platform

Alibaba Cloud Native

May 29, 2024 · Cloud Native

How SPL Transforms Log Processing in iLogtail 2.0: From Pipelines to Unified Stream Language

This article traces the evolution of stream‑processing languages, compares iLogtail's original pipeline model with the new SPL syntax, and provides a step‑by‑step practical example showing how SPL simplifies log parsing, improves performance, and unifies configuration across Alibaba Cloud services.

SPLiLogtailstream processing

0 likes · 14 min read

How SPL Transforms Log Processing in iLogtail 2.0: From Pipelines to Unified Stream Language

DataFunSummit

Apr 8, 2024 · Big Data

Ant Group's Real-Time Data Warehouse Architecture, Solutions, and Data Lake Outlook

This article presents Ant Group's recent explorations and practices in real-time data warehousing, covering its modular architecture, data quality assurance mechanisms, stream‑batch integration techniques, graph‑based conversion attribution, and future data‑lake implementation using Paimon.

Flinkstream processing

0 likes · 15 min read

Alibaba Cloud Native

Mar 24, 2024 · Cloud Native

How RocketMQ 5.0 Enables Lightweight Cloud‑Native Stream Processing with RStreams and RSQLDB

This article explains the evolution of message middleware, introduces core concepts of stream processing, and details RocketMQ 5.0's native lightweight stream engine RStreams and its stream database RSQLDB, showing how they simplify real‑time data integration, computation, and scaling in cloud‑native environments.

RSQLDBRStreamsRocketMQ

0 likes · 14 min read

How RocketMQ 5.0 Enables Lightweight Cloud‑Native Stream Processing with RStreams and RSQLDB

Big Data Technology & Architecture

Mar 20, 2024 · Big Data

Flink 1.19 New Features: SQL Optimizations, Runtime Enhancements, and Checkpointing Improvements

The article reviews Flink 1.19’s new features, highlighting SQL capability enhancements such as custom source parallelism, TTL hints, and MiniBatch support for regular joins, as well as runtime dynamic parallelism for batch jobs and flexible checkpointing intervals for different data sources.

Big DataFlinkSQL

0 likes · 6 min read

Flink 1.19 New Features: SQL Optimizations, Runtime Enhancements, and Checkpointing Improvements

DataFunTalk

Dec 27, 2023 · Big Data

Apache Flink 2023: Core Technical Achievements and Future Directions

The article reviews Apache Flink's rapid development over the past decade, highlighting its 2023 community growth, SIGMOD award, major releases, streaming SQL enhancements, incremental checkpointing, batch maturity, cloud‑native scaling, and integration with the emerging Lakehouse architecture.

Apache FlinkBig DataCheckpoint

0 likes · 11 min read

Apache Flink 2023: Core Technical Achievements and Future Directions

dbaplus Community

Dec 14, 2023 · Big Data

How Flink Powers Unified Stream‑Batch Processing at Scale: Production Lessons

This article explains why Flink was chosen as a unified stream‑batch engine, details the migration from Lambda architecture, outlines the Flink Batch production workflow, and shares key optimizations such as Hive dialect support, CTAS, adaptive scheduling, speculative execution, and future roadmap for large‑scale data processing.

Adaptive SchedulerBatch ProcessingBig Data

0 likes · 31 min read

How Flink Powers Unified Stream‑Batch Processing at Scale: Production Lessons

dbaplus Community

Dec 10, 2023 · Big Data

How Bilibili Built a Remote State Backend for Flink Using Taishan KV Store

This article explains Bilibili's design and implementation of a remote state backend for Flink, detailing the motivations, pain points of the existing RocksDBStateBackend, the architecture of TaishanStateBackend, and the performance optimizations applied to achieve storage‑compute separation and faster rescaling.

Big DataFlinkRemote Storage

0 likes · 21 min read

How Bilibili Built a Remote State Backend for Flink Using Taishan KV Store

Data Thinking Notes

Oct 11, 2023 · Big Data

How ByteDance Optimized Its E‑Commerce Data Lake to Cut Costs and Boost Real‑Time Accuracy

ByteDance revamped its traditional Lambda architecture for e‑commerce traffic data by introducing a new lake ingestion solution that reduces development and operational costs, ensures timely and stable data, and outlines future plans covering business background, ODS lake design, archiving tags, delayed data handling, and real‑time stability.

Big DataData LakeFlink

0 likes · 7 min read

How ByteDance Optimized Its E‑Commerce Data Lake to Cut Costs and Boost Real‑Time Accuracy

Efficient Ops

Sep 24, 2023 · Information Security

How China Postal Savings Bank Built an Enterprise‑Level AI‑Powered Anti‑Fraud Platform

The 2023 China International Service Trade Fair’s Digital Transformation Forum showcased the Postal Savings Bank’s enterprise‑grade intelligent anti‑fraud platform, detailing its stream‑batch integration, graph‑based AI models, and multi‑layer risk‑control architecture that safeguards millions of daily transactions across retail, agricultural, and credit services.

China Postal Savings Bankanti-fraudstream processing

0 likes · 8 min read

How China Postal Savings Bank Built an Enterprise‑Level AI‑Powered Anti‑Fraud Platform

WeiLi Technology Team

Aug 2, 2023 · Big Data

How to Build a Real-Time Data Warehouse: Architectures, Challenges, and Industry Practices

This article examines the growing demand for real‑time data warehouses, compares mature streaming frameworks, evaluates Lambda, Kappa and hybrid architectures, reviews industry implementations from Didi and OPPO, and proposes a standard‑layer + stream + data‑lake solution with Apache Paimon, Hudi, and Iceberg.

Apache FlinkKappa architectureLambda architecture

0 likes · 27 min read

How to Build a Real-Time Data Warehouse: Architectures, Challenges, and Industry Practices

Alibaba Cloud Developer

Jul 31, 2023 · Big Data

From BI to Kappa: How Data Architecture Evolved in the Big Data Era

This article traces the evolution of data architecture from early BI systems through traditional big‑data stacks, streaming, Lambda and Kappa designs, and explains how a unified stream‑batch model simplifies development while keeping logic consistent across data‑analysis and pipeline applications.

BI systemsBig DataData Architecture

0 likes · 16 min read

From BI to Kappa: How Data Architecture Evolved in the Big Data Era

Big Data Technology & Architecture

Jul 24, 2023 · Big Data

Real-time Data Warehouse Governance: Optimization Practices and Technical Enhancements

This article presents a comprehensive overview of the current challenges, platform architecture, governance planning, and technical optimizations—including Flink SQL, Kafka batch processing, and partitioned stream tables—used to improve resource efficiency, stability, and scalability of a large‑scale real‑time data warehouse.

Flink optimizationKafka batchReal-Time Data Warehouse

0 likes · 25 min read

Real-time Data Warehouse Governance: Optimization Practices and Technical Enhancements

Didi Tech

Jun 14, 2023 · Big Data

Real-Time Data Development Practices and Component Selection at Didi

Didi’s unified real‑time data stack outlines best‑practice component choices for four key scenarios—metric monitoring, BI analysis, online services, and feature/tag systems—detailing pipelines from source to sink, resource‑usage guidelines, and a one‑stop development platform to build stable, high‑performance streaming solutions.

ClickHouseDruidFlink

0 likes · 17 min read

Real-Time Data Development Practices and Component Selection at Didi

Programmer DD

Jun 7, 2023 · Cloud Native

Why Apache Pulsar Is the Next‑Gen Cloud‑Native Streaming Platform

This article explains how Apache Pulsar combines messaging, storage, and lightweight function computing into a cloud‑native streaming platform, detailing its architecture, storage‑compute separation, tiered storage, pluggable protocols, reliability guarantees, and rich ecosystem compared with traditional queues and Kafka.

Apache PulsarCloud NativeData Reliability

0 likes · 10 min read

Why Apache Pulsar Is the Next‑Gen Cloud‑Native Streaming Platform

WeChat Backend Team

Jun 1, 2023 · Big Data

How WeChat Boosted Flink Stability with TaskManager Recovery and Load Balancing

This article details WeChat’s Gemini‑2.0 real‑time streaming platform built on Flink, explaining two key stability enhancements: a TaskManager‑level partial failure recovery that avoids data loss during node crashes, and a load‑balancing scheduler that evenly distributes tasks across TaskManagers to improve resource utilization and reduce latency.

Big DataFlinkTaskManager Recovery

0 likes · 16 min read

How WeChat Boosted Flink Stability with TaskManager Recovery and Load Balancing

Architects Research Society

Apr 18, 2023 · Backend Development

Event Sourcing, CQRS, and Stream Processing with Apache Kafka

Event sourcing models state changes as immutable logs, and when combined with CQRS and Kafka Streams, it enables scalable, fault‑tolerant architectures where write and read paths are decoupled, supporting local or external state stores, interactive queries, and zero‑downtime upgrades.

CQRSbackend-architecturekafka streams

0 likes · 21 min read

Event Sourcing, CQRS, and Stream Processing with Apache Kafka

Baidu Geek Talk

Mar 27, 2023 · Big Data

Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse

The article details Baidu's precise watermark design for its unified streaming‑batch data warehouse, describing how a centralized watermark server and client ensure end‑to‑end data completeness, align real‑time and batch windows with 99.9‑99.99% precision, and support accurate anti‑fraud calculations within the broader big‑data ecosystem.

Apache FlinkBaiduBig Data

0 likes · 14 min read

Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse

ITPUB

Mar 24, 2023 · Big Data

What’s New in Apache Flink 1.17? Key Features, Performance Gains, and Streaming Warehouse Advances

Apache Flink 1.17 introduces a suite of batch and streaming enhancements—including a new Streaming Warehouse API, significant TPC‑DS performance boosts, adaptive batch scheduling, improved checkpointing, expanded SQL capabilities, Hive connector upgrades, and broader filesystem support—while also delivering upgrades to FRocksDB, Calcite, and the token framework to strengthen its position as a leading unified data‑processing engine.

Apache FlinkBatch ProcessingCheckpoint

0 likes · 23 min read

What’s New in Apache Flink 1.17? Key Features, Performance Gains, and Streaming Warehouse Advances

Architects Research Society

Mar 15, 2023 · Big Data

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Considerations

This article explains why exactly‑once semantics are needed for stream‑processing applications, describes Kafka's transactional model and semantics, details the Java transaction API and its usage, and discusses the internal components, performance trade‑offs, and practical guidelines for building reliable Kafka‑based pipelines.

Exactly-onceJavadistributed systems

0 likes · 17 min read

Understanding Transactions in Apache Kafka: Semantics, API, and Practical Considerations

Architects Research Society

Feb 21, 2023 · Big Data

Comparing Apache Spark and Apache Flink: Origins, Architecture, and Processing Models

This article examines the evolution, architectural differences, data and processing models, stateful handling, and programming APIs of Apache Spark and Apache Flink, highlighting their strengths, limitations, and the challenges of big‑data development and operations in the modern data‑driven era.

Batch ProcessingBig DataData Engine

0 likes · 18 min read

Comparing Apache Spark and Apache Flink: Origins, Architecture, and Processing Models

DataFunSummit

Feb 16, 2023 · Big Data

JD Real-Time Data Product Practice: Overview, Low‑Code Platform, Stream‑Batch Integration, and Operations

This article summarizes JD's real‑time data product practice, covering product overview, low‑code real‑time platform construction, stream‑batch integrated architecture, and the three‑layer operational defense model, while highlighting challenges, evolution, user distribution, and future directions.

Big DataLow‑code platformReal-time Data

0 likes · 13 min read

JD Real-Time Data Product Practice: Overview, Low‑Code Platform, Stream‑Batch Integration, and Operations

StarRing Big Data Open Lab

Feb 10, 2023 · Big Data

Why Impala, Flink, and Slipstream Are Shaping Real‑Time Interactive Analytics

This article explores the evolution of real‑time computing and compares three interactive analytics engines—Impala, Apache Flink, and Slipstream—detailing their architectures, key features, deployment considerations, and why they matter for modern big‑data stream processing.

Apache FlinkImpalaSlipstream

0 likes · 13 min read

Why Impala, Flink, and Slipstream Are Shaping Real‑Time Interactive Analytics

Su San Talks Tech

Jan 24, 2023 · Big Data

From 4 Hours to 1 Hour: Optimizing Coupon Calculations with Storm Stream Processing

Facing a four‑hour coupon‑calculation bottleneck at eLong, I explored Storm’s stream‑processing model, identified the data‑extraction stage as the weak link, refactored the hotel‑pull service, leveraged RocketMQ for threading, and ultimately reduced full‑batch processing time to just over an hour.

Coupon SystemJavaPerformance Optimization

0 likes · 12 min read

From 4 Hours to 1 Hour: Optimizing Coupon Calculations with Storm Stream Processing

Ctrip Technology

Jan 12, 2023 · Big Data

Real-Time Data Warehouse Architecture and Practice at Ctrip Hotel

The article explains why enterprises need real-time data warehouses, compares Lambda and Kappa architectures, describes Ctrip Hotel's Lambda‑plus‑OLAP variant built with Flink and StarRocks, and details practical solutions for ordering, wide‑table generation, and data validation that enable billion‑row, low‑latency analytics.

CtripFlinkLambda architecture

0 likes · 10 min read

Real-Time Data Warehouse Architecture and Practice at Ctrip Hotel

Bilibili Tech

Jan 6, 2023 · Backend Development

Hotspot Detection and Local Cache Framework for High‑Traffic Applications

The presented hotspot detection and local‑cache framework leverages the HeavyKeeper streaming top‑k algorithm with decay‑based burst detection, integrates zero‑code SDK support and a whitelist‑enabled LRU cache, enabling a few megabytes of memory to achieve up to 85% hit rates and dramatically reduce Redis load in high‑traffic applications.

distributed-cachingheavykeeperhotspot detection

0 likes · 21 min read

Hotspot Detection and Local Cache Framework for High‑Traffic Applications

vivo Internet Technology

Dec 28, 2022 · Big Data

Vivo Real-Time Computing Platform: Architecture, Practices, and Applications

The Vivo Real‑Time Computing Platform, built on Apache Flink, delivers a one‑stop data construction and governance solution that processes up to 5 PB daily, offering high‑availability submission and control services, robust stability, rich SQL usability, efficient Kubernetes deployment, strong security, and supports real‑time warehouses and short‑video recommendation, while targeting future elastic scaling and lake‑house unification.

Apache FlinkData PlatformReal-Time Computing

0 likes · 18 min read

Vivo Real-Time Computing Platform: Architecture, Practices, and Applications

Data Thinking Notes

Dec 23, 2022 · Big Data

How Real-Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices

This article explains why real‑time data warehouses are becoming essential, outlines their goals, compares them with traditional offline warehouses, and presents detailed design patterns, naming conventions, and case studies from Didi, Kuaishou, Tencent, Youzan and other enterprises, highlighting challenges and solutions for streaming, storage, and query layers.

Big Data ArchitectureData LakeETL

0 likes · 49 min read

How Real-Time Data Warehouses Power Modern Business: Architecture, Cases, and Best Practices

Alibaba Cloud Big Data AI Platform

Nov 29, 2022 · Big Data

How Flink’s Stream‑Batch Fusion Is Transforming Real‑Time Big Data

The article explores Apache Flink’s eight‑year journey to becoming a top‑level Apache project, Alibaba’s extensive contributions, the rise of stream‑batch unified computing, its impact on real‑time data integration, cloud‑native deployment, and the emerging Flink‑based data‑warehouse and serverless solutions.

Apache FlinkBig DataCloud Native

0 likes · 15 min read

How Flink’s Stream‑Batch Fusion Is Transforming Real‑Time Big Data

Past Memory Big Data

Nov 26, 2022 · Big Data

Is Apache Flink Truly Powerful Enough After Hundreds of Engineers and Multiple Double‑11 Deployments?

The interview with Alibaba researcher Wang Feng reviews Flink's eight‑year journey to a top Apache project, its massive scale at Double 11, the push toward unified stream‑batch computing, emerging storage challenges, and the roadmap for cloud‑native, real‑time data warehousing.

Apache FlinkBatch ProcessingCDC

0 likes · 16 min read

Is Apache Flink Truly Powerful Enough After Hundreds of Engineers and Multiple Double‑11 Deployments?

Programmer DD

Nov 26, 2022 · Big Data

How Flink Became the Real‑Time Big Data Standard – Insights from Alibaba’s Wang Feng

This interview with Alibaba researcher Wang Feng (aka Mo Wen) explores Apache Flink’s eight‑year journey to top‑level Apache status, its unified stream‑batch architecture, the rise of Flink Table Store and CDC, and how cloud‑native deployments are reshaping real‑time big data processing.

Apache FlinkBig DataCloud Native

0 likes · 16 min read

How Flink Became the Real‑Time Big Data Standard – Insights from Alibaba’s Wang Feng

ByteDance Terminal Technology

Nov 18, 2022 · Big Data

Practices and Techniques for Large‑Scale Distributed Trace Data Analysis at ByteDance

This article presents ByteDance’s experience building a massive trace‑data analysis platform, covering observability fundamentals, the evolution of its distributed tracing system, various aggregation computation models, technical architecture choices, and concrete use‑cases such as precise topology, traffic estimation, dependency analysis, performance anti‑patterns, bottleneck detection, and error propagation.

Big DataDistributed TracingMicroservices

0 likes · 21 min read

Practices and Techniques for Large‑Scale Distributed Trace Data Analysis at ByteDance

AsiaInfo Technology: New Tech Exploration

Nov 11, 2022 · Industry Insights

How Real-Time Data Middle Platforms are Transforming the Telecom Industry

This article analyzes why telecom operators need a real‑time data middle platform, outlines its layered architecture and model design, examines the shift from Lambda to Kappa and lakehouse approaches, and highlights how these innovations enable faster, scenario‑driven insights and competitive advantage.

Big Data ArchitectureData Middle PlatformFlink

0 likes · 15 min read

How Real-Time Data Middle Platforms are Transforming the Telecom Industry

Shopee Tech Team

Oct 13, 2022 · Big Data

Improving Flink Unaligned Checkpoint: Problems, Principles, Optimizations, and Production Practices at Shopee

Shopee tackled frequent Flink checkpoint failures caused by back‑pressure by adopting and extending the community’s Unaligned Checkpoint mechanism—adding overdraft buffers, improving legacy sources, introducing an aligned‑checkpoint timeout, enabling output‑buffer switching, merging small HDFS files, and fixing network‑buffer deadlocks—now running hundreds of jobs with stable UC deployment and plans to enable it universally.

Big DataCheckpoint OptimizationFlink

0 likes · 18 min read

Improving Flink Unaligned Checkpoint: Problems, Principles, Optimizations, and Production Practices at Shopee

Alibaba Cloud Developer

Oct 13, 2022 · Big Data

Ensuring Correctness in Stream Computing: Data Integrity Challenges and Engine Solutions

This article explores how stream computing systems achieve correct results by addressing data integrity, distinguishing consistency from correctness, formalizing integrity inference, and comparing implementations across major engines such as Flink, Kafka Streams, MillWheel, and Spark Structured Streaming.

CorrectnessFlinkdata integrity

0 likes · 28 min read

Ensuring Correctness in Stream Computing: Data Integrity Challenges and Engine Solutions

DeWu Technology

Oct 10, 2022 · Big Data

Offline and Real-Time User Profile Fusion Architecture

The architecture combines a nightly batch job that generates offline user profiles stored in HBase with a Flink‑based stream layer that lazily loads those profiles on app start and creates real‑time updates, then fuses both streams into a unified, timestamp‑ordered profile in Redis, forming a Lambda‑style pipeline.

Batch ProcessingFlinkHBase

0 likes · 10 min read

DataFunTalk

Oct 2, 2022 · Big Data

Real-time Data Warehouse Architecture and Hologres Technology Overview

This article explains the evolving requirements of real‑time data warehouses, analyzes Alibaba's Hologres technology principles, presents recommended architectures for various latency scenarios, and discusses practical case studies, performance, security, and cost‑optimization strategies for modern big‑data platforms.

Big DataCloud ComputingHologres

0 likes · 24 min read

Real-time Data Warehouse Architecture and Hologres Technology Overview

ITPUB

Sep 22, 2022 · Big Data

What Is a Real‑Time Data Warehouse? Product, Solution, and Use Cases Explained

The article explains the concept of real‑time data warehouses, traces their evolution from early relational databases to modern streaming‑batch engines, discusses whether they are products or solutions, outlines typical application scenarios, selection criteria, and future trends in the big‑data ecosystem.

CloudFlinkSpark

0 likes · 10 min read

What Is a Real‑Time Data Warehouse? Product, Solution, and Use Cases Explained

37 Interactive Technology Team

Aug 23, 2022 · Big Data

Optimizing Game Event Reporting with Stream Processing to Overcome ClickHouse Performance Bottlenecks

Faced with ClickHouse query times ballooning to over an hour for massive game‑event data, the team replaced the DB‑pull model with a stream‑processing pipeline that evaluates trigger rules in real time, cuts batch queries by 60 %, and brings reporting latency down to minutes.

ClickHouseGame AnalyticsPerformance Optimization

0 likes · 6 min read

Optimizing Game Event Reporting with Stream Processing to Overcome ClickHouse Performance Bottlenecks

DaTaobao Tech

Aug 11, 2022 · Big Data

Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing

The article describes how fragmented real‑time, batch, and online data‑warehouse pipelines suffer from low productivity and inconsistent data quality, and introduces a unified SQL engine built on Apache Calcite that parses, optimizes, and compiles a single SQL statement into executable plans for ODPS, Flink, or Java, leveraging Janino code generation, multi‑backend state storage, and snapshot‑join semantics to boost performance and simplify development.

Batch ProcessingCalciteData Warehouse

0 likes · 16 min read

Unify SQL Engine: Integrating Stream, Batch, and Online Computing for Data Warehousing

Baidu Geek Talk

Aug 9, 2022 · Big Data

How to Build a Real-Time Data Warehouse with Unified Stream‑Batch Architecture

This article examines the evolution of big‑data architectures, identifies the latency and maintenance issues of classic Lambda designs, and presents a hybrid Lambda‑Kappa solution that unifies streaming and batch processing to achieve minute‑level data freshness and second‑level query latency while reducing development cost.

Big DataKappa architectureLambda architecture

0 likes · 13 min read

How to Build a Real-Time Data Warehouse with Unified Stream‑Batch Architecture

ITPUB

Aug 8, 2022 · Big Data

Why Real‑Time Data Warehouses Are the New Competitive Edge for Enterprises

As markets become increasingly dynamic, companies must build real‑time infrastructure to gain timely insights, and this article explains the three real‑time analytics scenarios, the limitations of traditional stream engines, and how Skylab’s integrated cloud‑native platform and Omega architecture address those challenges.

Real-time Datacloud-nativestream processing

0 likes · 9 min read

Why Real‑Time Data Warehouses Are the New Competitive Edge for Enterprises

DataFunTalk

Jul 26, 2022 · Big Data

Feature Platform Architecture and Stream‑Batch Integrated Solutions

This talk presents Shuhe Technology’s feature platform, detailing its four‑layer architecture, feature storage services, stream‑batch integrated processing, event‑center design, consistency models, and four model‑strategy invocation schemes, illustrating data flows from MySQL through Sqoop, Kafka, Flink, HBase and ClickHouse.

Big DataClickHouseFlink

0 likes · 17 min read

Feature Platform Architecture and Stream‑Batch Integrated Solutions

Python Programming Learning Circle

Jul 13, 2022 · Big Data

Introduction to Faust: A Python Stream Processing Library for Kafka

This article introduces Faust, an open‑source Python library that brings Kafka Streams‑style stream processing to Python, covering its features, installation, a step‑by‑step example, typed data models, and how to run real‑time data pipelines with async support.

FaustPythonReal-time Data

0 likes · 5 min read

Introduction to Faust: A Python Stream Processing Library for Kafka

DataFunSummit

Jul 9, 2022 · Big Data

Alibaba's One‑Stop Real‑Time Data Warehouse: Hologres Architecture and CCO Implementation Experience

The article reviews the shift of big‑data computing from batch to real‑time, outlines the evolution of one‑stop real‑time data warehouses, introduces Alibaba's Hologres solution and its technical advantages, and shares the CCO department’s three‑generation architecture upgrades and practical use cases.

AlibabaData EngineeringHologres

0 likes · 16 min read

Alibaba's One‑Stop Real‑Time Data Warehouse: Hologres Architecture and CCO Implementation Experience

DataFunTalk

Jun 6, 2022 · Big Data

Understanding Flink's Exactly-Once Guarantees: Checkpoint, Two‑Phase Commit, and Kafka Integration

This article explains how Apache Flink achieves end‑to‑end exactly‑once semantics by using source replay support, checkpoint‑based snapshots, asynchronous incremental checkpoints, and two‑phase commit sinks, and describes the interaction with external systems such as Kafka to ensure transactional writes.

Big DataCheckpointExactly-once

0 likes · 7 min read

Understanding Flink's Exactly-Once Guarantees: Checkpoint, Two‑Phase Commit, and Kafka Integration

Shopee Tech Team

Apr 28, 2022 · Big Data

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Shopee replaced its hourly Hive pipeline with a hybrid Flink‑Hudi real‑time data warehouse that groups Kafka topics, applies lightweight stream ETL, uses partial‑update MOR tables for multi‑stream joins and COW tables for versioned batches, cutting latency from about 90 minutes to 2–30 minutes and halving resource usage.

Apache FlinkApache HudiBatch Processing

0 likes · 20 min read

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Top Architect

Apr 23, 2022 · Big Data

Ensuring No Duplicate and No Loss in Baidu Log Middle Platform: Architecture, Challenges, and Solutions

This article explains the design, implementation, and future plans of Baidu's log middle platform, detailing its lifecycle management, service architecture, data reliability goals of eliminating duplication and loss, and the technical measures taken across SDKs, servers, and streaming pipelines to achieve near‑100% data integrity.

Big DataData Reliabilitybackend-architecture

0 likes · 15 min read

Ensuring No Duplicate and No Loss in Baidu Log Middle Platform: Architecture, Challenges, and Solutions

DataFunSummit

Apr 22, 2022 · Big Data

Huya Real-Time Computing SLA Practice: Platform Evolution, Core SLA Definition, Capability Building, and Future Outlook

The talk details Huya’s real‑time computing platform evolution from chaotic early stages to a unified, containerized system, defines core SLA metrics focused on latency compliance, describes capability enhancements such as demand monitoring, task analysis, dynamic scaling, and outlines future goals for usability, stability, openness, and unified stream‑batch processing.

FlinkReal-Time ComputingSLA

0 likes · 12 min read

Huya Real-Time Computing SLA Practice: Platform Evolution, Core SLA Definition, Capability Building, and Future Outlook

Architect

Apr 18, 2022 · Big Data

Ensuring Data Accuracy and Reliability in Baidu's Log Middle Platform

This article describes Baidu's log middle platform architecture, its data lifecycle management, integration status, terminology, service overview, core challenges of ensuring data accuracy, and the implemented optimizations for persistent storage, service decomposition, and SDK reporting to achieve near‑100% no‑repeat no‑loss reliability.

Big DataData Reliabilitybackend-architecture

0 likes · 15 min read

Ensuring Data Accuracy and Reliability in Baidu's Log Middle Platform

dbaplus Community

Apr 13, 2022 · Big Data

How Meituan Built a Scalable Real‑Time Data Warehouse with Flink

This article explains Meituan's real‑time data warehouse architecture, covering typical business scenarios, the evolution of its streaming platform, key design challenges, solutions such as unified data models, SQL‑based development, UDF hosting, operator optimizations, and future plans for incremental processing and unified batch‑stream semantics.

FlinkMeituanReal-time Data

0 likes · 18 min read

How Meituan Built a Scalable Real‑Time Data Warehouse with Flink

High Availability Architecture

Apr 11, 2022 · Big Data

Ensuring Data Accuracy and Reliability in Baidu Log Platform: Architecture, Challenges, and Solutions

This article introduces the current state of Baidu's log platform, explains its lifecycle from data collection to downstream applications, analyzes the challenges of achieving near‑zero duplication and loss, and presents architectural optimizations and best‑practice recommendations to improve data stability and accuracy across the system.

Big DataData Reliabilitydata pipeline

0 likes · 19 min read

Ensuring Data Accuracy and Reliability in Baidu Log Platform: Architecture, Challenges, and Solutions

Alibaba Cloud Native

Feb 22, 2022 · Big Data

Why RocketMQ-Streams Delivers High‑Performance, Low‑Resource Stream Computing

RocketMQ-Streams targets massive data, high‑filtering, lightweight windowed computations with a lightweight, high‑performance design that runs on as little as 1 CPU core and 1 GB RAM, offering 2‑5× speed gains over traditional big‑data engines and supporting Flink‑compatible SQL, UDFs, and cloud‑native deployment.

Exactly-onceFlink SQLRocketMQ-Streams

0 likes · 10 min read

Why RocketMQ-Streams Delivers High‑Performance, Low‑Resource Stream Computing

DataFunSummit

Jan 30, 2022 · Big Data

Real‑time Data Warehouse at Meituan: Architecture, Challenges, and Solutions

This article presents Meituan's real‑time data warehouse platform, describing typical streaming use cases, the evolution of its architecture from Storm and Spark Streaming to Flink, the challenges of development, operations and data quality, and the engineering solutions—including unified SQL, web IDE, UDF hosting, pipeline testing, and operator performance optimizations—implemented to support large‑scale, low‑latency analytics.

FlinkReal-time Dataplatform architecture

0 likes · 17 min read

Real‑time Data Warehouse at Meituan: Architecture, Challenges, and Solutions

Architecture Digest

Jan 21, 2022 · Big Data

Building a Real-Time Data Warehouse with Flink: Architecture, Core Concepts, and Practical Implementation

This article explains how to build a unified stream‑batch real‑time data warehouse using FlinkSQL, covering prerequisite knowledge, five core concepts, two implementation approaches, a comparison of traditional versus real‑time architectures, and a comprehensive hands‑on example, illustrated with diagrams.

Batch ProcessingData ArchitectureFlink

0 likes · 6 min read

Building a Real-Time Data Warehouse with Flink: Architecture, Core Concepts, and Practical Implementation

StarRocks

Jan 12, 2022 · Big Data

How Flink + StarRocks Deliver Lightning‑Fast Real‑Time Data Warehousing

This article explains the evolution, challenges, and technical solutions for building an end‑to‑end real‑time data warehouse by combining Apache Flink's stream processing with StarRocks' ultra‑fast OLAP engine, covering architecture, data models, integration methods, best‑practice cases, and future roadmap.

Big DataFlinkOLAP

0 likes · 21 min read

How Flink + StarRocks Deliver Lightning‑Fast Real‑Time Data Warehousing

DataFunTalk

Jan 11, 2022 · Big Data

Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses

In an exclusive InfoQ interview, Apache Flink community leader Wang Feng (aka Mo Wen) outlines the evolution of Flink toward a Streaming Warehouse, detailing recent technical advances, use‑case scenarios, and the upcoming Dynamic Table storage that aim to unify stream and batch processing for real‑time data‑warehouse workloads.

Apache FlinkBig DataDynamic Table

0 likes · 16 min read

Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses

Tencent Cloud Developer

Jan 7, 2022 · Big Data

Design and Implementation of a Hundred‑Billion‑Scale Real‑Time Monitoring System

The paper details a hundred‑billion‑scale real‑time monitoring system, outlining a layered architecture from collection to alerting, comparing Oceanus + Elastic Stack and Zabbix + Prometheus + Grafana solutions, and showing how targeted optimizations in stream processing and Elasticsearch achieve scalability, low latency, and significant cost savings.

Elastic StackOceanusPerformance Optimization

0 likes · 19 min read

Design and Implementation of a Hundred‑Billion‑Scale Real‑Time Monitoring System

Youzan Coder

Dec 8, 2021 · Big Data

How to Build a Real‑Time Data Quality Monitoring System with Flink

This article outlines a comprehensive approach to monitoring and ensuring the accuracy and timeliness of real‑time data streams, detailing background challenges, solution design, implementation steps using Flink and automated testing, alert handling procedures, and future improvement plans.

AlertingData QualityFlink

0 likes · 10 min read

How to Build a Real‑Time Data Quality Monitoring System with Flink

HomeTech

Dec 7, 2021 · Big Data

Flink Task Auto-scaling Design and Implementation

This article presents the design and implementation of Flink task auto‑scaling, covering background, manual and automatic scaling mechanisms, architecture with RescaleCoordinator, persistence via Zookeeper and HDFS, scaling policies for parallelism, CPU and memory, and future plans for fine‑grained and time‑based resource adjustments.

Auto ScalingFlinkHDFS

0 likes · 4 min read

Flink Task Auto-scaling Design and Implementation

Big Data Technology & Architecture

Nov 2, 2021 · Big Data

Understanding Kafka: From Message Engine to Distributed Stream Processing Platform

This article explains Kafka's evolution—highlighting the introduction of Kafka Streams, the shift to a full distributed stream processing platform, practical learning paths, source‑code reading tips, common pitfalls, and the major new features introduced in Kafka 3.0.

Big Datadistributed systemskafka

0 likes · 7 min read

Understanding Kafka: From Message Engine to Distributed Stream Processing Platform

Alibaba Cloud Developer

Oct 13, 2021 · Big Data

Why “Exactly‑Once” Doesn’t Guarantee Consistency in Stream Processing

This article examines the true meaning of consistency in stream computing, clarifies common misconceptions about exactly‑once semantics, formalizes consistency challenges, and reviews how major stream engines such as Google MillWheel, Apache Flink, Kafka Streams, and Spark Streaming implement end‑to‑end consistency.

Big DataExactly-oncefault tolerance

0 likes · 29 min read

Why “Exactly‑Once” Doesn’t Guarantee Consistency in Stream Processing

NetEase Game Operations Platform

Sep 18, 2021 · Big Data

StreamflySQL: NetEase Games’ Journey from Template JAR to SQL Gateway for Flink SQL Platformization

This article details NetEase Games’ evolution of its Flink SQL platform, from the early StreamflySQL v1 template‑JAR approach to the v2 SQL‑Gateway architecture, discussing design decisions, challenges such as metadata persistence, multi‑tenant security, horizontal scaling, and job state management.

FlinkPlatform EngineeringSQL

0 likes · 17 min read

StreamflySQL: NetEase Games’ Journey from Template JAR to SQL Gateway for Flink SQL Platformization

Tencent Cloud Developer

Sep 3, 2021 · Big Data

Design and Implementation of a Real-Time Video Live Streaming Analytics System Using Tencent Cloud Big Data Services

The article details a cloud‑native architecture on Tencent Cloud that uses CKafka, Oceanus (Flink), MySQL, HBase and a BI service to ingest live‑streaming logs, aggregate gift‑reward metrics in real time, store results, and display them on a continuously refreshed dashboard.

Business IntelligenceCloud Servicesstream processing

0 likes · 15 min read

Design and Implementation of a Real-Time Video Live Streaming Analytics System Using Tencent Cloud Big Data Services

ByteDance ADFE Team

Aug 31, 2021 · Big Data

Evolution of the Big Data Technology Stack Over the Past Five Years

This article reviews the evolution of big data technologies in the last five years, covering streaming and batch processing frameworks, column‑store NoSQL databases, programming language trends, the cloud‑native multi‑model database Lindorm, and practical Flink/Blink usage with code examples.

Big DataData EngineeringFlink

0 likes · 24 min read

Evolution of the Big Data Technology Stack Over the Past Five Years

Meituan Technology Team

Aug 26, 2021 · Big Data

How Meituan Built a Scalable Real‑Time Data Warehouse: Architecture & Lessons

Meituan Waimai’s data intelligence team outlines a universal real‑time data‑warehouse methodology that combines a production platform with an interactive analytics engine, detailing scenarios, technology choices, architectural designs, platformization, SLA management, and a practical Lambda‑style case study.

DorisFlinkKappa architecture

0 likes · 18 min read

How Meituan Built a Scalable Real‑Time Data Warehouse: Architecture & Lessons

Python Crawling & Data Mining

Aug 21, 2021 · Big Data

Understanding Flink’s Architecture: From APIs to Cluster Deployment

This article explains Flink’s three‑layer architecture (APIs & Libraries, Core, Deploy), details its programming interfaces, runtime engine, deployment options, and core concepts such as stateful computation and time semantics, providing a comprehensive guide for building robust stream and batch applications.

FlinkStateful ComputingTime Semantics

0 likes · 13 min read

Understanding Flink’s Architecture: From APIs to Cluster Deployment

DataFunSummit

Aug 15, 2021 · Big Data

Building a General Real-Time Data Warehouse: Methods and Practices at Meituan Waimai

This article introduces a universal method for building a real-time data warehouse at Meituan Waimai, covering streaming technologies, architecture choices such as Lambda and Kappa, component design, feature production, SLA management, and practical OLAP solutions using Flink, Storm, and Doris.

DorisFlinkKappa architecture

0 likes · 15 min read

Building a General Real-Time Data Warehouse: Methods and Practices at Meituan Waimai

Spring Full-Stack Practical Cases

Aug 12, 2021 · Backend Development

Master Kafka Streams in Spring Boot: Real‑Time Data Processing with Code Samples

This guide walks through setting up Kafka Streams with Spring Boot 2.3, covering environment configuration, core concepts, topology design, and multiple practical examples—including message sending, listening, transformations, aggregations, filtering, branching, and multi‑field grouping—complete with full code snippets and execution results.

JavaSpring Bootkafka

0 likes · 13 min read

Master Kafka Streams in Spring Boot: Real‑Time Data Processing with Code Samples

Volcano Engine Developer Services

Aug 11, 2021 · Big Data

How Volcengine Solves Big Data Quality Challenges with a Unified Stream‑Batch Platform

Volcengine’s Data Quality Platform bridges the gap between data validation and resource‑intensive computation in large‑scale environments, offering unified stream‑batch monitoring, data exploration, comparison, and alerting across Hive, ClickHouse, Kafka, and more, while addressing scalability, latency, and resource optimization challenges.

Big DataData QualityMonitoring

0 likes · 19 min read

How Volcengine Solves Big Data Quality Challenges with a Unified Stream‑Batch Platform

ByteFE

Jul 29, 2021 · Frontend Development

Implementing a Large File Chunked Upload Library: A Full-Stack TypeScript Guide

This article provides a comprehensive guide to building a large file chunked upload library from scratch using TypeScript, detailing both server-side stream processing for memory efficiency and client-side MD5 calculation with retry mechanisms to ensure reliable and performant file transfers.

Chunked UploadMD5 verificationNode.js

0 likes · 22 min read

Implementing a Large File Chunked Upload Library: A Full-Stack TypeScript Guide