Tagged articles
135 articles
Page 1 of 2
DataFunSummit
DataFunSummit
Feb 7, 2026 · Big Data

How Flink Enables Real‑Time AI Inference and Agent Construction

This article explains Apache Flink’s stream processing fundamentals, introduces the open‑source Flink Agents framework for building event‑driven AI agents, details Alibaba Cloud’s Flink AI Function for real‑time LLM inference, and showcases demos, architecture, integration patterns, and practical use cases such as VOC analysis, live‑stream analytics, and intelligent operations.

Apache FlinkBig DataCloud Computing
0 likes · 24 min read
How Flink Enables Real‑Time AI Inference and Agent Construction
ByteDance Data Platform
ByteDance Data Platform
Feb 2, 2026 · Big Data

How StreamShield Powers Production‑Grade Resilience for Apache Flink at Massive Scale

ByteDance’s StreamShield delivers a three‑layer resiliency framework—engine self‑healing, hybrid replication at the cluster level, and chaos‑tested releases—that enables over 70,000 concurrent Flink jobs on 11 million CPU cores to meet strict SLAs with second‑level startup and robust fault tolerance.

Apache FlinkByteDanceReal‑Time Computing
0 likes · 6 min read
How StreamShield Powers Production‑Grade Resilience for Apache Flink at Massive Scale
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jan 8, 2026 · Big Data

How Gaode Maps Built a Real‑Time Lakehouse for Billion‑Scale Trajectory Data

This article details Gaode Maps' end‑to‑end lakehouse solution for massive, high‑frequency trajectory data, covering the challenges of real‑time visibility, query performance, and storage cost, and explaining how a hot‑warm‑cold tiering architecture built on Apache Flink, Paimon, StarRocks, Redis and Lindorm delivers millisecond‑level queries while cutting storage expenses.

Apache FlinkApache PaimonData Tiering
0 likes · 19 min read
How Gaode Maps Built a Real‑Time Lakehouse for Billion‑Scale Trajectory Data
StarRocks
StarRocks
Jan 7, 2026 · Big Data

How Gaode Maps Built a Real‑Time Lakehouse for Billion‑Scale Trajectory Data

This article details Gaode Maps' end‑to‑end lakehouse solution for handling high‑frequency, high‑volume trajectory data, covering the challenges of real‑time visibility, multi‑scenario queries, storage cost, and data silos, and describing the layered storage architecture, performance validation, and future expansion plans.

Apache FlinkData TieringLakehouse
0 likes · 21 min read
How Gaode Maps Built a Real‑Time Lakehouse for Billion‑Scale Trajectory Data
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Aug 7, 2025 · Operations

How Alibaba Scales Flink to Millions of Cores: Real‑Time Ops Secrets

This article details Alibaba's decade‑long evolution of its real‑time computing platform, the massive operational challenges of managing Flink clusters at million‑core scale, and the comprehensive strategies—including SLA metrics, self‑healing services, cloud‑native redesign, and job‑level advisory tools—used to ensure stability, cost efficiency, and performance during peak events like Double‑11.

Apache FlinkCloud NativeJob Advisory
0 likes · 19 min read
How Alibaba Scales Flink to Millions of Cores: Real‑Time Ops Secrets
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jul 4, 2025 · Big Data

From Real-Time Data Analytics to Real-Time AI: Flink Forward Asia 2025 Highlights

The Flink Forward Asia 2025 conference in Singapore showcased Apache Flink's evolution with new AI‑driven projects such as Flink Agents, the integration of AI Functions in Flink 2.1, the disaggregated state management architecture of Flink 2.0, and complementary lakehouse technologies like Paimon and Fluss, underscoring the platform's role as the real‑time backbone for modern AI applications.

Apache FlinkData LakehouseDisaggregated State Management
0 likes · 9 min read
From Real-Time Data Analytics to Real-Time AI: Flink Forward Asia 2025 Highlights
DataFunTalk
DataFunTalk
Jul 4, 2025 · Big Data

How Flink Agents and Flink 2.0 Are Powering Real‑Time AI at Scale

The Flink Forward Asia 2025 conference in Singapore showcased Apache Flink’s latest advances—including Flink Agents for system‑triggered AI, the cloud‑native Flink 2.0 with disaggregated state management, the multi‑modal lakehouse Paimon, and the Fluss table storage system—highlighting the ecosystem’s shift toward real‑time AI integration.

Apache FlinkData LakeFlink 2.0
0 likes · 9 min read
How Flink Agents and Flink 2.0 Are Powering Real‑Time AI at Scale
Big Data Tech Team
Big Data Tech Team
Jun 2, 2025 · Big Data

Master Apache Flink: A Complete Learning Roadmap from Basics to Advanced Projects

This guide outlines a comprehensive Apache Flink learning path, covering prerequisite knowledge, core concepts, APIs, state management, performance tuning, hands‑on projects, advanced topics like SQL optimization and Kubernetes deployment, plus curated resources and community tips to help beginners and intermediate users become proficient.

Apache FlinkFlink Tutoriallearning roadmap
0 likes · 8 min read
Master Apache Flink: A Complete Learning Roadmap from Basics to Advanced Projects
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 29, 2024 · Big Data

How Fluss Redefines Real‑Time Stream Storage for Flink

Fluss, an open‑source real‑time stream storage project from Alibaba, integrates columnar formats and low‑latency updates with Apache Flink to address the limitations of traditional Kafka‑Flink pipelines, offering high throughput, low cost, and seamless lakehouse support for modern data analytics.

Apache FlinkFlussreal-time storage
0 likes · 6 min read
How Fluss Redefines Real‑Time Stream Storage for Flink
Huolala Tech
Huolala Tech
Nov 7, 2024 · Big Data

How HuoLaLa Scaled Real‑Time Data Capture with Flink CDC: Architecture, Challenges, and Results

This article details HuoLaLa's logistics platform challenges with petabyte‑scale data, the selection of Apache Flink CDC for stable, compatible, and low‑latency data ingestion, the construction of a multi‑layer CDC capability, migration strategies, measurable performance gains, and future open‑source contributions.

Apache FlinkFlink CDCdata ingestion
0 likes · 15 min read
How HuoLaLa Scaled Real‑Time Data Capture with Flink CDC: Architecture, Challenges, and Results
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 5, 2024 · Big Data

Key Features of Apache Flink 1.20: Materialized Tables, DISTRIBUTED BY, and State/Checkpoint Optimizations

The article reviews Apache Flink 1.20, highlighting the new Materialized Table concept, the DISTRIBUTED BY support for load‑balanced storage and join performance, and state/checkpoint file merging improvements, while providing code examples and practical insights for users.

Apache FlinkBig DataCheckpoint Optimization
0 likes · 7 min read
Key Features of Apache Flink 1.20: Materialized Tables, DISTRIBUTED BY, and State/Checkpoint Optimizations
DeWu Technology
DeWu Technology
Jul 31, 2024 · Big Data

Custom Flink Scheduler Enhancements: Resource Balancing, Task Migration, and TmRestart Strategy

The article details Dewu’s custom Flink scheduler, DwScheduler, which adds JSON‑based resource specifications, per‑TaskManager slot sharing for balanced CPU use, hot TaskManager migration callbacks, and a new TmRestart strategy for rapid pod‑process recovery, offering practical techniques to enhance real‑time stream processing stability and performance.

Apache FlinkPerformance OptimizationResource Management
0 likes · 9 min read
Custom Flink Scheduler Enhancements: Resource Balancing, Task Migration, and TmRestart Strategy
Tencent Cloud Developer
Tencent Cloud Developer
Jul 2, 2024 · Big Data

Apache Flink Deployment with Pulsar Connector: Setup, Demos, and Best Practices

This guide shows how to deploy Apache Flink 1.17 in Docker, configure off‑heap memory, connect it to Pulsar via the 4.1.0‑1.17 connector, run example jobs that copy topics and perform windowed word‑count, and provides Maven dependencies, custom serialization tips, batching settings, and version‑specific best‑practice notes.

Apache FlinkDataStreamDocker deployment
0 likes · 20 min read
Apache Flink Deployment with Pulsar Connector: Setup, Demos, and Best Practices
DataFunTalk
DataFunTalk
Dec 27, 2023 · Big Data

Apache Flink 2023: Core Technical Achievements and Future Directions

The article reviews Apache Flink's rapid development over the past decade, highlighting its 2023 community growth, SIGMOD award, major releases, streaming SQL enhancements, incremental checkpointing, batch maturity, cloud‑native scaling, and integration with the emerging Lakehouse architecture.

Apache FlinkBig DataCheckpoint
0 likes · 11 min read
Apache Flink 2023: Core Technical Achievements and Future Directions
DataFunTalk
DataFunTalk
Dec 15, 2023 · Big Data

Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0

The Flink Forward Asia 2023 conference showcased major updates to Apache Flink (versions 1.17 and 1.18), introduced the Apache Paimon lakehouse project, announced Flink CDC 3.0, and highlighted community growth, cloud‑native deployments, and real‑time data‑warehouse use cases across industry leaders.

Apache FlinkApache PaimonBig Data
0 likes · 17 min read
Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 10, 2023 · Big Data

How Open‑Source Big Data 3.0 Is Redefining Real‑Time, Serverless, and AI‑Driven Analytics

The talk outlines Alibaba Cloud's open‑source big data platform evolution to version 3.0, highlighting the streaming lakehouse architecture, full serverless transformation, and AI‑enhanced operations that together enable real‑time analytics, higher performance, and smarter data management.

Apache FlinkPaimonstreaming lakehouse
0 likes · 15 min read
How Open‑Source Big Data 3.0 Is Redefining Real‑Time, Serverless, and AI‑Driven Analytics
Volcano Engine Developer Services
Volcano Engine Developer Services
Aug 21, 2023 · Big Data

From Contributor to Committer: Lessons from ByteDance’s Apache Flink Journey

ByteDance’s streaming computing team members Fang Yong and Hu Weihua share their path from early Flink adopters to Apache Flink Committers, detailing their contributions to Runtime Coordinator and Streaming Warehouse, the challenges of open‑source involvement, and practical advice for developers seeking to engage with the Flink community.

Apache FlinkCommitterOpen-source
0 likes · 10 min read
From Contributor to Committer: Lessons from ByteDance’s Apache Flink Journey
WeiLi Technology Team
WeiLi Technology Team
Aug 2, 2023 · Big Data

How to Build a Real-Time Data Warehouse: Architectures, Challenges, and Industry Practices

This article examines the growing demand for real‑time data warehouses, compares mature streaming frameworks, evaluates Lambda, Kappa and hybrid architectures, reviews industry implementations from Didi and OPPO, and proposes a standard‑layer + stream + data‑lake solution with Apache Paimon, Hudi, and Iceberg.

Apache FlinkKappa architectureLambda architecture
0 likes · 27 min read
How to Build a Real-Time Data Warehouse: Architectures, Challenges, and Industry Practices
Baidu Geek Talk
Baidu Geek Talk
Mar 27, 2023 · Big Data

Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse

The article details Baidu's precise watermark design for its unified streaming‑batch data warehouse, describing how a centralized watermark server and client ensure end‑to‑end data completeness, align real‑time and batch windows with 99.9‑99.99% precision, and support accurate anti‑fraud calculations within the broader big‑data ecosystem.

Apache FlinkBaiduBig Data
0 likes · 14 min read
Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse
ITPUB
ITPUB
Mar 24, 2023 · Big Data

What’s New in Apache Flink 1.17? Key Features, Performance Gains, and Streaming Warehouse Advances

Apache Flink 1.17 introduces a suite of batch and streaming enhancements—including a new Streaming Warehouse API, significant TPC‑DS performance boosts, adaptive batch scheduling, improved checkpointing, expanded SQL capabilities, Hive connector upgrades, and broader filesystem support—while also delivering upgrades to FRocksDB, Calcite, and the token framework to strengthen its position as a leading unified data‑processing engine.

Apache FlinkBatch ProcessingCheckpoint
0 likes · 23 min read
What’s New in Apache Flink 1.17? Key Features, Performance Gains, and Streaming Warehouse Advances

How NetEase Yanxuan Migrated from Lambda to Iceberg for Real‑Time Batch‑Stream Integration

This article details how NetEase Yanxuan transformed its data platform from a dual Lambda architecture to a unified batch‑stream solution built on Apache Iceberg, covering the original challenges, the evaluation of Iceberg versus Hudi and Delta Lake, implementation of stream‑batch pipelines, message ordering fixes, snapshot generation, and extensive table‑governance optimizations.

Apache FlinkApache SparkBatch-Stream Integration
0 likes · 14 min read
How NetEase Yanxuan Migrated from Lambda to Iceberg for Real‑Time Batch‑Stream Integration
DataFunTalk
DataFunTalk
Jan 20, 2023 · Big Data

Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework

This article introduces Flink CDC, explains its incremental snapshot algorithm and the 2.0 framework design, compares it with traditional CDC pipelines, discusses the core API and dialect concept, and outlines community growth and future plans, providing a comprehensive technical overview for data engineers.

Apache FlinkBig DataChange Data Capture
0 likes · 13 min read
Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework
vivo Internet Technology
vivo Internet Technology
Dec 28, 2022 · Big Data

Vivo Real-Time Computing Platform: Architecture, Practices, and Applications

The Vivo Real‑Time Computing Platform, built on Apache Flink, delivers a one‑stop data construction and governance solution that processes up to 5 PB daily, offering high‑availability submission and control services, robust stability, rich SQL usability, efficient Kubernetes deployment, strong security, and supports real‑time warehouses and short‑video recommendation, while targeting future elastic scaling and lake‑house unification.

Apache FlinkData PlatformReal‑Time Computing
0 likes · 18 min read
Vivo Real-Time Computing Platform: Architecture, Practices, and Applications
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 30, 2022 · Big Data

What’s New in Apache Flink 2022? Highlights from the Flink Forward Asia Summit

The 2022 Flink Forward Asia summit showcased Apache Flink’s rapid community growth, key technical breakthroughs such as distributed snapshot upgrades, cloud‑native state storage, hybrid shuffle, Flink CDC 2.0, and Flink ML 2.0, and real‑world deployments at companies like Midea, miHoYo and Disney.

Apache FlinkBig DataFlink Forward Asia
0 likes · 25 min read
What’s New in Apache Flink 2022? Highlights from the Flink Forward Asia Summit
DataFunTalk
DataFunTalk
Nov 29, 2022 · Big Data

Summary of Flink Forward Asia 2022: Keynotes, Technical Innovations, and Industry Deployments of Apache Flink

The 2022 Flink Forward Asia conference highlighted Apache Flink’s rapid growth, showcased major technical advances such as upgraded checkpointing, cloud‑native state storage, Hybrid Shuffle, Flink CDC 2.0, and Flink ML 2.0, and presented real‑world deployments from Alibaba, Midea, miHoYo, and Disney.

Apache FlinkData IntegrationReal-time Streaming
0 likes · 25 min read
Summary of Flink Forward Asia 2022: Keynotes, Technical Innovations, and Industry Deployments of Apache Flink
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 29, 2022 · Big Data

How Flink’s Stream‑Batch Fusion Is Transforming Real‑Time Big Data

The article explores Apache Flink’s eight‑year journey to becoming a top‑level Apache project, Alibaba’s extensive contributions, the rise of stream‑batch unified computing, its impact on real‑time data integration, cloud‑native deployment, and the emerging Flink‑based data‑warehouse and serverless solutions.

Apache FlinkBig DataCloud Native
0 likes · 15 min read
How Flink’s Stream‑Batch Fusion Is Transforming Real‑Time Big Data
JD Tech
JD Tech
Sep 6, 2022 · Big Data

Flink Streaming Job Tuning Guide: Memory Model, Network Stack, RocksDB, and More

This article presents a detailed guide for optimizing large‑scale Apache Flink streaming jobs on the JD Real‑Time Computing platform, covering TaskManager memory model tuning, network stack configuration, RocksDB state management, checkpoint strategies, and additional performance tips with practical examples and calculations.

Apache FlinkCheckpointNetwork Stack
0 likes · 22 min read
Flink Streaming Job Tuning Guide: Memory Model, Network Stack, RocksDB, and More
政采云技术
政采云技术
Aug 2, 2022 · Fundamentals

Understanding the Chandy‑Lamport Distributed Snapshot Algorithm

This article explains the Chandy‑Lamport algorithm for capturing consistent global snapshots in distributed systems, describes its assumptions and message‑marker rules, walks through a detailed example with three processes and channels, and relates it to Apache Flink's asynchronous checkpoint mechanism.

Apache FlinkChandy-LamportDistributed Systems
0 likes · 14 min read
Understanding the Chandy‑Lamport Distributed Snapshot Algorithm
DataFunTalk
DataFunTalk
May 19, 2022 · Big Data

SeaTunnel: Distributed Data Integration Platform and Its Application in Traffic Management

This article introduces Apache SeaTunnel, a distributed, high‑performance data integration platform built on Spark and Flink, outlines its technical features, workflow, and plugin ecosystem, and details a concrete traffic‑management use case involving incremental Oracle‑to‑warehouse data synchronization with Spark resources and scheduled shell scripts.

Apache FlinkApache SparkBig Data
0 likes · 12 min read
SeaTunnel: Distributed Data Integration Platform and Its Application in Traffic Management
Shopee Tech Team
Shopee Tech Team
Apr 28, 2022 · Big Data

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Shopee replaced its hourly Hive pipeline with a hybrid Flink‑Hudi real‑time data warehouse that groups Kafka topics, applies lightweight stream ETL, uses partial‑update MOR tables for multi‑stream joins and COW tables for versioned batches, cutting latency from about 90 minutes to 2–30 minutes and halving resource usage.

Apache FlinkApache HudiBatch Processing
0 likes · 20 min read
Building Real-Time Data Warehouse with Flink + Hudi at Shopee
Big Data Technology & Architecture
Big Data Technology & Architecture
Apr 20, 2022 · Big Data

Fine‑Grained Resource Management in Apache Flink: Scenarios, Mechanism, Efficiency, Allocation Strategies, and Limitations

This article explains Apache Flink's fine‑grained resource management, describing typical use cases, the slot‑based mechanism, how it improves resource efficiency, the default allocation strategy, current limitations, and provides example code for configuring slot sharing groups.

Apache FlinkBig DataFine-Grained Resource Management
0 likes · 12 min read
Fine‑Grained Resource Management in Apache Flink: Scenarios, Mechanism, Efficiency, Allocation Strategies, and Limitations
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 2, 2022 · Big Data

What’s New in Flink CDC 2.2? A Deep Dive into Added Sources and Core Features

The article introduces Flink CDC 2.2, highlighting its expanded support for twelve data sources—including OceanBase, PolarDB‑X, SqlServer, and TiDB—while detailing core features such as the incremental snapshot framework, multi‑version Flink compatibility, dynamic table addition, and numerous bug fixes and performance improvements.

Apache FlinkChange Data CaptureConnector
0 likes · 9 min read
What’s New in Flink CDC 2.2? A Deep Dive into Added Sources and Core Features
DataFunTalk
DataFunTalk
Jan 25, 2022 · Big Data

Summary of Flink Forward Asia 2021: Community Growth, Cloud‑Native Deployment, Streaming‑Batch Integration, and Machine Learning

The article provides a comprehensive English summary of the 2021 Flink Forward Asia conference, covering community statistics, cloud‑native deployment modes, fault‑tolerance checkpoint advances, the evolution of streaming‑batch integration, the introduction of Streaming Warehouse, Flink ML 2.0, real‑time use cases at ByteDance and ICBC, Pravega storage innovations, and concluding reflections on the future of real‑time big data processing.

Apache FlinkBig Data
0 likes · 25 min read
Summary of Flink Forward Asia 2021: Community Growth, Cloud‑Native Deployment, Streaming‑Batch Integration, and Machine Learning
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 12, 2022 · Big Data

Common Production Issues and Troubleshooting Guide for Apache Flink

This article compiles a comprehensive list of common production problems encountered with Apache Flink, covering cluster sizing, checkpoint failures, backpressure analysis, resource allocation, deployment errors, UDF definitions, data skew, Kafka configurations, and provides detailed troubleshooting steps and best‑practice recommendations.

Apache FlinkCheckpointKafka
0 likes · 39 min read
Common Production Issues and Troubleshooting Guide for Apache Flink
DataFunTalk
DataFunTalk
Jan 11, 2022 · Big Data

Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses

In an exclusive InfoQ interview, Apache Flink community leader Wang Feng (aka Mo Wen) outlines the evolution of Flink toward a Streaming Warehouse, detailing recent technical advances, use‑case scenarios, and the upcoming Dynamic Table storage that aim to unify stream and batch processing for real‑time data‑warehouse workloads.

Apache FlinkBig DataDynamic Table
0 likes · 16 min read
Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses
Programmer DD
Programmer DD
Jan 8, 2022 · Big Data

How Flink’s Streaming Warehouse Is Redefining Real‑Time Data Lakes

This interview explores Apache Flink’s evolution toward a Streaming Warehouse, detailing its stream‑batch integration, new CDC‑based data integration, the Dynamic Table storage architecture, and how these innovations aim to simplify and accelerate real‑time big‑data analytics.

Apache FlinkBig DataDynamic Table
0 likes · 17 min read
How Flink’s Streaming Warehouse Is Redefining Real‑Time Data Lakes
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 24, 2021 · Big Data

Key Updates and New Features in Apache Flink 1.14.2 Release

The Apache Flink 1.14.2 release, launched on December 16, fixes a critical Log4j vulnerability, resolves OOM issues with the Pulsar connector, introduces numerous Table API, DataStream API, connector, and checkpoint enhancements, deprecates several legacy APIs, and drops support for Apache Mesos, while also promoting related PDF resources.

Apache FlinkBig DataCheckpoints
0 likes · 8 min read
Key Updates and New Features in Apache Flink 1.14.2 Release
Tencent Cloud Developer
Tencent Cloud Developer
Nov 9, 2021 · Big Data

Comprehensive Overview of Apache Flink Streaming Computation and Architecture

The article systematically introduces Apache Flink’s streaming computation model, contrasting batch and real‑time processing, detailing its unified architecture, managed and raw state with key groups, checkpointing and savepoints for fault tolerance, data exchange mechanisms, time semantics, windowing, side‑outputs, and a complete Java Kafka‑based example.

Apache FlinkCheckpointFlink Architecture
0 likes · 46 min read
Comprehensive Overview of Apache Flink Streaming Computation and Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 9, 2021 · Big Data

Apache Flink 1.7–1.14 Release Highlights and Feature Evolution

This article provides a comprehensive overview of Apache Flink's major releases from version 1.7 to 1.14, detailing new APIs, state management improvements, Kubernetes integration, SQL and Table API enhancements, checkpointing advances, and performance optimizations that together illustrate the platform's evolution for both streaming and batch processing workloads.

Apache FlinkBatch ProcessingCheckpoint
0 likes · 78 min read
Apache Flink 1.7–1.14 Release Highlights and Feature Evolution
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 25, 2021 · Big Data

Apache Flink Release History and Key Features from 1.7 to 1.12

This article provides a comprehensive overview of Apache Flink's major releases from version 1.7 through 1.12, detailing new functionalities such as Scala 2.12 support, state schema evolution, Blink planner integration, Kubernetes native deployment, Python (PyFlink) enhancements, and numerous performance and stability improvements for stream and batch processing.

Apache FlinkPyFlinkTable API
0 likes · 54 min read
Apache Flink Release History and Key Features from 1.7 to 1.12
Big Data Technology Architecture
Big Data Technology Architecture
Aug 10, 2021 · Big Data

Building a Real‑Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's practical experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points of traditional Lambda architectures, Iceberg's table format and capabilities, Flink‑Iceberg sink design, small‑file handling, and future roadmap for a unified streaming‑batch data lake.

Apache FlinkApache IcebergBatch Processing
0 likes · 20 min read
Building a Real‑Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices
dbaplus Community
dbaplus Community
Jun 5, 2021 · Big Data

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

This article explains the concept of data lakes, outlines a four‑layer open‑source architecture, presents several classic Flink‑Iceberg use cases, details why Iceberg was chosen, and describes the design of Flink’s streaming sink and upcoming community roadmap.

Apache FlinkApache IcebergBig Data
0 likes · 14 min read
How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming
DataFunTalk
DataFunTalk
May 4, 2021 · Big Data

Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome

This article presents the background, requirements, architectural design, component interaction, and implementation details of AutoHome's real‑time data transmission platform built on Apache Flink, highlighting its high availability, exactly‑once semantics, scalability, DDL handling, and integration with existing streaming services.

Apache FlinkBig DataData Streaming
0 likes · 18 min read
Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome
DataFunTalk
DataFunTalk
Mar 28, 2021 · Big Data

Flink Stream‑Batch Integration: Layered Architecture, Unified SDK, DAG Scheduler, Shuffle, and Fault‑Tolerance

This article explains how Apache Flink has evolved into a unified stream‑batch engine by introducing a three‑layer architecture, a unified DataStream SDK, a pipeline‑region‑based DAG scheduler, a common shuffle framework, and enhanced fault‑tolerance mechanisms to address efficiency, consistency, and resource‑utilisation challenges in real‑time big‑data processing.

Apache FlinkBatch ProcessingDAG scheduler
0 likes · 25 min read
Flink Stream‑Batch Integration: Layered Architecture, Unified SDK, DAG Scheduler, Shuffle, and Fault‑Tolerance
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 19, 2021 · Big Data

A Comprehensive Guide to Learning Apache Flink: Background, Core Concepts, Modules, Source Code, and Industry Applications

This article provides a detailed learning roadmap for Apache Flink, covering its theoretical background, key research papers, fundamental concepts, core modules, source‑code exploration, real‑time data‑warehouse use cases, event‑driven applications, and emerging trends in the big‑data ecosystem.

Apache FlinkEvent-drivenState Management
0 likes · 9 min read
A Comprehensive Guide to Learning Apache Flink: Background, Core Concepts, Modules, Source Code, and Industry Applications
Sohu Tech Products
Sohu Tech Products
Feb 17, 2021 · Big Data

Dynamic Broadcast State and Data Partitioning in an Apache Flink Fraud Detection Engine

This article demonstrates how to initialize, broadcast, and dynamically update rule sets in an Apache Flink fraud detection pipeline, using BroadcastProcessFunction and MapState to achieve runtime data partitioning without recompiling, and explains the underlying data exchange patterns such as forward, hash, rebalance, and broadcast.

Apache FlinkBroadcast StateDynamic Key Function
0 likes · 11 min read
Dynamic Broadcast State and Data Partitioning in an Apache Flink Fraud Detection Engine
Sohu Tech Products
Sohu Tech Products
Feb 17, 2021 · Big Data

Dynamic Data Partitioning in Apache Flink: A Fraud Detection Demo

This article explains how to implement dynamic data partitioning in Apache Flink using a fraud‑detection demo, covering the system architecture, rule‑driven runtime reconfiguration, custom ProcessFunction code, and the underlying key‑by logic that enables flexible, real‑time stream processing.

Apache FlinkDynamic PartitioningKeyBy
0 likes · 11 min read
Dynamic Data Partitioning in Apache Flink: A Fraud Detection Demo
DataFunTalk
DataFunTalk
Feb 12, 2021 · Big Data

Apache Flink at Kuaishou: Past, Present, and Future

Zhao Jianbo, head of Kuaishou's big data architecture team, presents an in‑depth overview of Apache Flink's adoption at Kuaishou, covering reasons for selection, development history, business data flows, technical innovations such as the Slimbase state engine, stability improvements, and future roadmap.

Apache FlinkBig DataKuaishou
0 likes · 16 min read
Apache Flink at Kuaishou: Past, Present, and Future
DataFunTalk
DataFunTalk
Feb 1, 2021 · Big Data

Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points, Iceberg's table format and capabilities, Flink‑Iceberg streaming and batch processing, practical implementations, and future roadmap for data‑lake acceleration.

Apache FlinkApache IcebergBig Data
0 likes · 21 min read
Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices
Alibaba Cloud Developer
Alibaba Cloud Developer
Jan 25, 2021 · Big Data

Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem

In 2020, Apache Flink surged to become the most active Apache project, releasing three major versions that advanced its unified stream‑batch engine, introduced cloud‑native K8s support, expanded AI capabilities with PyFlink, and fostered a thriving Chinese community, solidifying its role as the de‑facto standard for real‑time computing.

AI integrationApache FlinkBig Data
0 likes · 21 min read
Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem
DataFunTalk
DataFunTalk
Jan 22, 2021 · Big Data

Practical Experience of Apache Flink at ByteDance: Architecture, Optimizations, and Future Directions

This article presents ByteDance's real‑world use of Apache Flink, covering the platform's overall architecture, SQL extensions, custom connectors, UI‑driven SQL platform, performance optimizations such as window mini‑batch and custom windows, dimension‑table enhancements, checkpoint recovery improvements, stream‑batch integration, and upcoming roadmap items.

Apache FlinkBig DataByteDance
0 likes · 15 min read
Practical Experience of Apache Flink at ByteDance: Architecture, Optimizations, and Future Directions
Byte Quality Assurance Team
Byte Quality Assurance Team
Jan 6, 2021 · Big Data

Fundamentals of Stream Processing: Bounded vs. Unbounded Data, Time Domains, and Windowing Strategies

This article provides a comprehensive introduction to stream processing fundamentals by distinguishing between bounded and unbounded datasets, clarifying the critical differences between event time and processing time, and exploring various windowing strategies to demonstrate how modern distributed systems efficiently handle continuous data flows.

Apache FlinkData WindowingEvent Time
0 likes · 13 min read
Fundamentals of Stream Processing: Bounded vs. Unbounded Data, Time Domains, and Windowing Strategies
DataFunTalk
DataFunTalk
Jan 5, 2021 · Big Data

Highlights of Flink Forward Asia 2020: Stream‑Batch Integration, AI Fusion, and Cloud‑Native Advances

The 2020 Flink Forward Asia conference showcased Apache Flink's rapid growth, community milestones, industry adoption, and technical breakthroughs such as unaligned checkpoints, approximate failover, the Nexmark benchmark, stream‑batch unification, AI integration via PyFlink and Alink, and deep cloud‑native support on Kubernetes, illustrated through case studies from Alibaba, Meituan, Kuaishou, and Dell.

AI integrationApache FlinkCloud Native
0 likes · 20 min read
Highlights of Flink Forward Asia 2020: Stream‑Batch Integration, AI Fusion, and Cloud‑Native Advances
DataFunTalk
DataFunTalk
Dec 11, 2020 · Big Data

My Journey and Contributions in the Apache Flink Community

The author shares his personal journey from first encountering Flink to becoming an Apache Flink Committer at ByteDance, detailing community involvement, code contributions, bug fixes, lessons learned, advice for newcomers, and concluding with promotional offers for Flink services.

Apache FlinkBlink PlannerOpen-source
0 likes · 12 min read
My Journey and Contributions in the Apache Flink Community
Big Data Technology Architecture
Big Data Technology Architecture
Nov 27, 2020 · Big Data

Integrating Apache Flink with Data Lakes Using Apache Iceberg: Architecture, Use Cases, and Future Roadmap

This article explains how Apache Flink combines with Apache Iceberg to build unified stream‑batch data lake solutions, covering data lake fundamentals, architectural layers, classic business scenarios, reasons for choosing Iceberg, streaming ingestion design, and upcoming community enhancements.

Apache FlinkApache IcebergTable Format
0 likes · 13 min read
Integrating Apache Flink with Data Lakes Using Apache Iceberg: Architecture, Use Cases, and Future Roadmap
DataFunTalk
DataFunTalk
Jul 22, 2020 · Big Data

Building a Real-Time Computing Platform with Apache Flink at iQIYI: Architecture, Improvements, and Business Cases

iQIYI’s senior data engineer shares the evolution of its big‑data services from Hadoop to a Flink‑based real‑time computing platform, detailing architecture, monitoring improvements, StreamingSQL capabilities, business use cases like recommendation and deep‑learning data generation, and future plans for unified stream‑batch processing.

Apache FlinkData PlatformFlink
0 likes · 11 min read
Building a Real-Time Computing Platform with Apache Flink at iQIYI: Architecture, Improvements, and Business Cases
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 13, 2020 · Big Data

What’s New in Apache Flink 1.11? A Deep Dive into Features and Performance

Apache Flink 1.11.0, released after four months of development, brings major ecosystem, usability, and stability improvements—including CDC support, a new JDBC catalog, real‑time Hive integration, a redesigned source API, PyFlink enhancements, application mode for Kubernetes, and checkpoint optimizations—while highlighting the growing contribution of Chinese developers.

Apache FlinkCheckpointFeature Release
0 likes · 20 min read
What’s New in Apache Flink 1.11? A Deep Dive into Features and Performance
DataFunTalk
DataFunTalk
Jul 13, 2020 · Big Data

Design and Challenges of Netflix’s Keystone Real‑Time Data Platform

This article first outlines Netflix’s Keystone real‑time data platform, describing its background, functionalities, and the distributed‑system challenges and solutions such as ordering semantics and processing contracts, then shifts to announce the second‑edition Apache Flink Geek Challenge, detailing its theme, schedule, prizes, and registration instructions.

Apache Flinkbig data competitionreal-time data platform
0 likes · 18 min read
Design and Challenges of Netflix’s Keystone Real‑Time Data Platform
Big Data Technology Architecture
Big Data Technology Architecture
Jun 29, 2020 · Big Data

Real‑time Data Warehouse Construction: Goals, Architecture, and Best Practices with Apache Flink

This article summarizes the objectives, design principles, application scenarios, layer‑by‑layer construction methods, quality assurance mechanisms, and supporting tools for building a real‑time data warehouse using Apache Flink, providing practical guidance for data engineers and architects.

Apache FlinkData QualityFlink
0 likes · 24 min read
Real‑time Data Warehouse Construction: Goals, Architecture, and Best Practices with Apache Flink
Architect
Architect
Jun 11, 2020 · Big Data

Understanding Apache Flink Architecture, Data Transfer, Event‑Time Processing, State Management, and Checkpointing

This article explains Apache Flink's distributed system architecture—including JobManager, ResourceManager, TaskManager, and Dispatcher—covers session and job deployment modes, data transfer mechanisms, event‑time handling with watermarks, various state types and backends, scaling strategies, and the checkpoint/savepoint recovery process.

Apache FlinkBig DataEvent Time
0 likes · 15 min read
Understanding Apache Flink Architecture, Data Transfer, Event‑Time Processing, State Management, and Checkpointing
Big Data Technology Architecture
Big Data Technology Architecture
May 22, 2020 · Big Data

Apache Flink 1.11 New Features Overview

The article provides a comprehensive overview of Apache Flink 1.11, detailing enhancements in cluster deployment, resource management, source/sink APIs, state backends, Table & SQL improvements, DataStream extensions, PyFlink/ML support, and runtime optimizations, along with relevant code examples and references.

Apache FlinkFlink 1.11Table API
0 likes · 19 min read
Apache Flink 1.11 New Features Overview
Didi Tech
Didi Tech
Apr 30, 2020 · Big Data

Didi’s Real‑Time Computing Practices with Apache Flink and StreamSQL

Didi has unified its real‑time computing on Apache Flink, creating an enhanced StreamSQL service with extended DDL, built‑in parsers and UDX, supporting thousands of nodes, millions of jobs, and trillions of daily records, while addressing state management, high availability, multi‑language UDFs, and pursuing real‑time ML and data‑warehouse integration.

Apache FlinkBig DataDidi
0 likes · 13 min read
Didi’s Real‑Time Computing Practices with Apache Flink and StreamSQL
DataFunTalk
DataFunTalk
Apr 22, 2020 · Big Data

Didi's Real-Time Computing Practices with Apache Flink: Architecture, StreamSQL, and Operational Insights

Senior Didi technology expert Liang Li-yin shares how Didi leverages Apache Flink for large‑scale real‑time computing, covering service architecture, StreamSQL advantages, multi‑cluster management, task control, monitoring, meta‑store integration, challenges, and future plans such as high availability, real‑time ML, and unified batch‑stream processing.

Apache FlinkBig DataReal‑Time Computing
0 likes · 14 min read
Didi's Real-Time Computing Practices with Apache Flink: Architecture, StreamSQL, and Operational Insights
DataFunTalk
DataFunTalk
Apr 15, 2020 · Big Data

Apache Flink OLAP Engine: Architecture, Optimizations, and Use Cases

This article presents an in‑depth overview of Apache Flink's new OLAP engine, covering OLAP fundamentals, the three OLAP models, Flink's unified streaming‑batch‑OLAP architecture, performance optimizations, benchmark results, and future development directions.

Apache FlinkBig DataOLAP
0 likes · 11 min read
Apache Flink OLAP Engine: Architecture, Optimizations, and Use Cases
HomeTech
HomeTech
Mar 11, 2020 · Big Data

Streaming SQL with Apache Flink: Theory, Platform Optimizations, and Real‑Time Use Cases

This article introduces Apache Flink's Streaming SQL, explains its theoretical foundations such as the table‑stream relationship and watermark semantics, describes the platform's practical enhancements—including source/sink wrappers, built‑in functions, and native Retract Stream support—and showcases several real‑time computation examples.

Apache FlinkDataStreamReal‑Time Computing
0 likes · 31 min read
Streaming SQL with Apache Flink: Theory, Platform Optimizations, and Real‑Time Use Cases
DataFunTalk
DataFunTalk
Mar 6, 2020 · Artificial Intelligence

Advances in Apache Flink AI Ecosystem: ML Pipeline, AI Flow, and Mini‑Batch Streaming Iteration

This article reviews recent progress in Apache Flink's AI ecosystem, explaining how Flink unifies batch and stream processing for machine‑learning pipelines, introduces the Flink ML Pipeline and Alink library, describes the AI Flow framework for end‑to‑end ML workflows, and presents a novel mini‑batch streaming iteration mechanism to support both offline and online learning scenarios.

AI FlowApache FlinkMini-batch Iteration
0 likes · 13 min read
Advances in Apache Flink AI Ecosystem: ML Pipeline, AI Flow, and Mini‑Batch Streaming Iteration
Youzan Coder
Youzan Coder
Feb 28, 2020 · Big Data

Flink Checkpoint Principle Analysis and Failure Cause Investigation

The article thoroughly explains Apache Flink’s checkpoint mechanism—including state types, coordinator workflow, exactly‑once versus at‑least‑once semantics, common failure sources such as code exceptions, storage or network issues, and practical configuration tips like interval settings, local recovery and externalized checkpoints.

Apache FlinkCheckpointExactly-Once
0 likes · 15 min read
Flink Checkpoint Principle Analysis and Failure Cause Investigation
dbaplus Community
dbaplus Community
Feb 25, 2020 · Backend Development

How to Merge Small Files in Flink Checkpoints to Reduce HDFS Load

This article explains a small‑file‑merging technique for Apache Flink checkpoints that reuses FSDataOutputStreams to combine multiple state files into a single HDFS file, detailing design considerations such as concurrent checkpoint support, reference‑counted deletion, space amplification reduction, fault handling, compatibility, and observed production performance gains.

Apache FlinkCheckpointHDFS
0 likes · 13 min read
How to Merge Small Files in Flink Checkpoints to Reduce HDFS Load
Alibaba Cloud Developer
Alibaba Cloud Developer
Feb 24, 2020 · Big Data

What’s New in Apache Flink 1.10? Deep Dive into Major Features and Enhancements

Apache Flink 1.10 introduces a major upgrade that merges the Blink engine, boosts performance and stability, adds native Kubernetes support, enhances SQL DDL, delivers production‑ready Hive batch compatibility, optimizes memory management, and expands Python UDF capabilities, with detailed feature breakdowns and code examples.

Apache FlinkBatch ProcessingSQL DDL
0 likes · 8 min read
What’s New in Apache Flink 1.10? Deep Dive into Major Features and Enhancements
Big Data Technology & Architecture
Big Data Technology & Architecture
Feb 15, 2020 · Big Data

Understanding Event Time and Watermarks in Apache Flink

This article explains how Apache Flink uses event‑time timestamps and watermarks to handle out‑of‑order and late data, describes the assignTimestampsAndWatermarks API with periodic and punctuated watermark assigners, and provides practical code examples for window lateness and side‑output handling.

Apache FlinkEvent TimeFlink
0 likes · 10 min read
Understanding Event Time and Watermarks in Apache Flink
Big Data Technology Architecture
Big Data Technology Architecture
Feb 12, 2020 · Big Data

Apache Flink 1.10 Release: New Features, Optimizations, and Kubernetes Integration

Apache Flink 1.10 introduces major performance and stability improvements, unified memory configuration, native Kubernetes session mode, enhanced Table API/SQL with production‑ready Hive integration, expanded Python UDF support, and a host of important bug fixes and connector updates, marking the largest community‑driven release to date.

Apache FlinkHive IntegrationPython
0 likes · 17 min read
Apache Flink 1.10 Release: New Features, Optimizations, and Kubernetes Integration
Big Data Technology & Architecture
Big Data Technology & Architecture
Dec 9, 2019 · Big Data

Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees

This article explains how to develop a real‑time ETL application using Apache Flink that reads events from Kafka, partitions them by event time into HDFS directories, and achieves exactly‑once processing through checkpointing, custom bucket assigners, and proper state backend configuration.

Apache FlinkBig DataExactly-Once
0 likes · 11 min read
Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees
Alibaba Cloud Developer
Alibaba Cloud Developer
Dec 5, 2019 · Big Data

What I Learned at Flink Forward Asia 2019: Stream Processing, AI, and Cloud‑Native Insights

The three‑day Flink Forward Asia 2019 conference in Beijing attracted over 2,000 attendees, showcased more than 45 talks from leading companies and researchers, and highlighted the evolution of Flink toward a unified engine, Stateful Functions, AI integration, cloud‑native deployment, and real‑time analytics at massive scale.

Apache FlinkArtificial IntelligenceCloud Native
0 likes · 16 min read
What I Learned at Flink Forward Asia 2019: Stream Processing, AI, and Cloud‑Native Insights
Big Data Technology & Architecture
Big Data Technology & Architecture
Sep 16, 2019 · Big Data

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

This guide provides a detailed overview of Apache Flink covering its core streaming engine, APIs (DataSet, DataStream, Table), architectural components, comparison with Spark Streaming, partitioning, parallelism, restart strategies, state backends, time semantics, watermarks, SQL processing, fault‑tolerance mechanisms, memory management, serialization, RPC framework, back‑pressure handling, operator chaining, and practical tips for interview preparation.

Apache FlinkBig DataDataflow
0 likes · 22 min read
Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 26, 2019 · Big Data

Comprehensive Collection of Apache Flink Learning Resources

This article compiles a curated list of the most reliable and official Apache Flink learning materials—including beginner tutorials, source‑code walkthroughs, advanced topics, community articles, real‑world case studies, and downloadable resources—providing a one‑stop reference for developers and researchers interested in stream processing and big‑data analytics.

Apache FlinkBig DataResources
0 likes · 10 min read
Comprehensive Collection of Apache Flink Learning Resources