Tagged articles

Apache Flink

142 articles · Page 1 of 2

Jun 29, 2026 · Big Data

How Agentic Streaming Is Redefining Real‑Time AI at Flink Forward Asia 2026

The Flink Forward Asia 2026 conference in Shenzhen showcased Apache Flink's evolution to Agentic Streaming for AI, introduced the multimodal Agentic Lake built on Apache Paimon 2.0, announced Fluss 1.0 as a real‑time context layer, and highlighted performance gains over competing stacks such as Ray and Daft.

Agentic StreamingApache FlinkApache Fluss

0 likes · 13 min read

How Agentic Streaming Is Redefining Real‑Time AI at Flink Forward Asia 2026

Alibaba Cloud Big Data AI Platform

Jun 27, 2026 · Industry Insights

Key Takeaways from Flink Forward Asia 2026: Agentic Streaming and AI‑Native Real‑Time Data

The Flink Forward Asia 2026 conference in Shenzhen gathered leading experts to discuss the evolution of Apache Flink toward agentic streaming, multimodal data processing, and AI‑native real‑time platforms, highlighting insights from Alibaba Cloud, NVIDIA, and independent researchers on the future of AI‑driven data pipelines.

AIAgentic StreamingApache Flink

0 likes · 5 min read

Key Takeaways from Flink Forward Asia 2026: Agentic Streaming and AI‑Native Real‑Time Data

Alibaba Cloud Big Data AI Platform

Jun 26, 2026 · Big Data

Flink Forward Asia 2026 Launches in Shenzhen: Agentic Streaming for AI Opens a New Real-Time Intelligence Era

The Flink Forward Asia 2026 conference in Shenzhen announced the evolution of Apache Flink toward Agentic Streaming for AI, unveiled multimodal data lake projects like Apache Paimon 2.0 and Fluss, highlighted performance gains over competing stacks, and showcased collaborations with NVIDIA to accelerate real‑time AI workloads.

Agentic StreamingApache FlinkApache Fluss

0 likes · 13 min read

Flink Forward Asia 2026 Launches in Shenzhen: Agentic Streaming for AI Opens a New Real-Time Intelligence Era

Spring Full-Stack Practical Cases

Jun 2, 2026 · Big Data

Millisecond‑Level Real‑Time Sync from MySQL to Elasticsearch with Flink CDC

This guide walks through setting up a Spring Boot 3.5 environment, configuring Flink 1.20 and Flink CDC 3.5, preparing MySQL tables, and using both the Flink CDC CLI and SQL client to achieve near‑millisecond synchronization of data from MySQL to Elasticsearch, including custom sink programming and real‑time monitoring via the Flink Web UI.

Apache FlinkElasticsearchFlink CDC

0 likes · 14 min read

Millisecond‑Level Real‑Time Sync from MySQL to Elasticsearch with Flink CDC

Amazon Cloud Developers

Feb 12, 2026 · Big Data

Stop Struggling with Flink Monitoring: Strands Agents Provide AI‑Driven Analysis & Optimization

The article explains how traditional Flink monitoring suffers from scattered metrics, manual root‑cause analysis, and lack of actionable advice, and introduces a cloud‑native system built on Strands Agents and Amazon Bedrock that automatically collects metrics, performs LLM‑powered analysis, generates optimization recommendations, and interacts with users via natural‑language dialogue and real‑time streaming output.

AI monitoringAmazon BedrockApache Flink

0 likes · 12 min read

Stop Struggling with Flink Monitoring: Strands Agents Provide AI‑Driven Analysis & Optimization

DataFunSummit

Feb 7, 2026 · Big Data

How Flink Enables Real‑Time AI Inference and Agent Construction

This article explains Apache Flink’s stream processing fundamentals, introduces the open‑source Flink Agents framework for building event‑driven AI agents, details Alibaba Cloud’s Flink AI Function for real‑time LLM inference, and showcases demos, architecture, integration patterns, and practical use cases such as VOC analysis, live‑stream analytics, and intelligent operations.

Apache FlinkBig DataCloud Computing

0 likes · 24 min read

How Flink Enables Real‑Time AI Inference and Agent Construction

ByteDance Data Platform

Feb 2, 2026 · Big Data

How StreamShield Powers Production‑Grade Resilience for Apache Flink at Massive Scale

ByteDance’s StreamShield delivers a three‑layer resiliency framework—engine self‑healing, hybrid replication at the cluster level, and chaos‑tested releases—that enables over 70,000 concurrent Flink jobs on 11 million CPU cores to meet strict SLAs with second‑level startup and robust fault tolerance.

Apache FlinkByteDanceReal-Time Computing

0 likes · 6 min read

How StreamShield Powers Production‑Grade Resilience for Apache Flink at Massive Scale

Alibaba Cloud Big Data AI Platform

Jan 8, 2026 · Big Data

How Gaode Maps Built a Real‑Time Lakehouse for Billion‑Scale Trajectory Data

This article details Gaode Maps' end‑to‑end lakehouse solution for massive, high‑frequency trajectory data, covering the challenges of real‑time visibility, query performance, and storage cost, and explaining how a hot‑warm‑cold tiering architecture built on Apache Flink, Paimon, StarRocks, Redis and Lindorm delivers millisecond‑level queries while cutting storage expenses.

Apache FlinkApache PaimonData Tiering

0 likes · 19 min read

How Gaode Maps Built a Real‑Time Lakehouse for Billion‑Scale Trajectory Data

StarRocks

Jan 7, 2026 · Big Data

How Gaode Maps Built a Real‑Time Lakehouse for Billion‑Scale Trajectory Data

This article details Gaode Maps' end‑to‑end lakehouse solution for handling high‑frequency, high‑volume trajectory data, covering the challenges of real‑time visibility, multi‑scenario queries, storage cost, and data silos, and describing the layered storage architecture, performance validation, and future expansion plans.

Apache FlinkData TieringLakehouse

0 likes · 21 min read

Past Memory Big Data

Dec 12, 2025 · Big Data

How Uber Reduced Data Freshness from Hours to Minutes Using Flink Streaming

Uber rebuilt its data‑lake ingestion pipeline with Apache Flink, replacing batch jobs with a streaming architecture that cuts data freshness from hours to minutes, lowers compute usage by 25%, and solves challenges like small‑file proliferation, partition skew, and checkpoint‑commit synchronization at petabyte scale.

Apache FlinkApache HudiData Freshness

0 likes · 10 min read

How Uber Reduced Data Freshness from Hours to Minutes Using Flink Streaming

Big Data Technology & Architecture

Aug 25, 2025 · Artificial Intelligence

How to Deploy Real-Time AI Models with Apache Flink 2.1

This guide explains Apache Flink 2.1's new AI model DDL and ML_PREDICT function, showing step‑by‑step how to create, manage, and invoke AI models in streaming SQL, configure resources with Little's Law, and process risk‑assessment results in real time.

Apache FlinkML_PREDICTReal-time AI

0 likes · 5 min read

How to Deploy Real-Time AI Models with Apache Flink 2.1

Alibaba Cloud Big Data AI Platform

Aug 7, 2025 · Operations

How Alibaba Scales Flink to Millions of Cores: Real‑Time Ops Secrets

This article details Alibaba's decade‑long evolution of its real‑time computing platform, the massive operational challenges of managing Flink clusters at million‑core scale, and the comprehensive strategies—including SLA metrics, self‑healing services, cloud‑native redesign, and job‑level advisory tools—used to ensure stability, cost efficiency, and performance during peak events like Double‑11.

Apache FlinkCluster OperationsJob Advisory

0 likes · 19 min read

How Alibaba Scales Flink to Millions of Cores: Real‑Time Ops Secrets

Alibaba Cloud Big Data AI Platform

Jul 4, 2025 · Big Data

From Real-Time Data Analytics to Real-Time AI: Flink Forward Asia 2025 Highlights

The Flink Forward Asia 2025 conference in Singapore showcased Apache Flink's evolution with new AI‑driven projects such as Flink Agents, the integration of AI Functions in Flink 2.1, the disaggregated state management architecture of Flink 2.0, and complementary lakehouse technologies like Paimon and Fluss, underscoring the platform's role as the real‑time backbone for modern AI applications.

Apache FlinkData LakehouseDisaggregated State Management

0 likes · 9 min read

From Real-Time Data Analytics to Real-Time AI: Flink Forward Asia 2025 Highlights

DataFunTalk

Jul 4, 2025 · Big Data

How Flink Agents and Flink 2.0 Are Powering Real‑Time AI at Scale

The Flink Forward Asia 2025 conference in Singapore showcased Apache Flink’s latest advances—including Flink Agents for system‑triggered AI, the cloud‑native Flink 2.0 with disaggregated state management, the multi‑modal lakehouse Paimon, and the Fluss table storage system—highlighting the ecosystem’s shift toward real‑time AI integration.

Apache FlinkData LakeFlink 2.0

0 likes · 9 min read

How Flink Agents and Flink 2.0 Are Powering Real‑Time AI at Scale

Big Data Tech Team

Jun 2, 2025 · Big Data

Master Apache Flink: A Complete Learning Roadmap from Basics to Advanced Projects

This guide outlines a comprehensive Apache Flink learning path, covering prerequisite knowledge, core concepts, APIs, state management, performance tuning, hands‑on projects, advanced topics like SQL optimization and Kubernetes deployment, plus curated resources and community tips to help beginners and intermediate users become proficient.

Apache FlinkFlink Tutoriallearning roadmap

0 likes · 8 min read

Master Apache Flink: A Complete Learning Roadmap from Basics to Advanced Projects

Alibaba Cloud Big Data AI Platform

Nov 29, 2024 · Big Data

How Fluss Redefines Real‑Time Stream Storage for Flink

Fluss, an open‑source real‑time stream storage project from Alibaba, integrates columnar formats and low‑latency updates with Apache Flink to address the limitations of traditional Kafka‑Flink pipelines, offering high throughput, low cost, and seamless lakehouse support for modern data analytics.

Apache FlinkFlussreal-time storage

0 likes · 6 min read

How Fluss Redefines Real‑Time Stream Storage for Flink

Huolala Tech

Nov 7, 2024 · Big Data

How HuoLaLa Scaled Real‑Time Data Capture with Flink CDC: Architecture, Challenges, and Results

This article details HuoLaLa's logistics platform challenges with petabyte‑scale data, the selection of Apache Flink CDC for stable, compatible, and low‑latency data ingestion, the construction of a multi‑layer CDC capability, migration strategies, measurable performance gains, and future open‑source contributions.

Apache FlinkFlink CDCReal-time Data

0 likes · 15 min read

How HuoLaLa Scaled Real‑Time Data Capture with Flink CDC: Architecture, Challenges, and Results

Big Data Technology & Architecture

Aug 5, 2024 · Big Data

Key Features of Apache Flink 1.20: Materialized Tables, DISTRIBUTED BY, and State/Checkpoint Optimizations

The article reviews Apache Flink 1.20, highlighting the new Materialized Table concept, the DISTRIBUTED BY support for load‑balanced storage and join performance, and state/checkpoint file merging improvements, while providing code examples and practical insights for users.

Apache FlinkBig DataCheckpoint Optimization

0 likes · 7 min read

Key Features of Apache Flink 1.20: Materialized Tables, DISTRIBUTED BY, and State/Checkpoint Optimizations

DeWu Technology

Jul 31, 2024 · Big Data

Custom Flink Scheduler Enhancements: Resource Balancing, Task Migration, and TmRestart Strategy

The article details Dewu’s custom Flink scheduler, DwScheduler, which adds JSON‑based resource specifications, per‑TaskManager slot sharing for balanced CPU use, hot TaskManager migration callbacks, and a new TmRestart strategy for rapid pod‑process recovery, offering practical techniques to enhance real‑time stream processing stability and performance.

Apache FlinkResource ManagementScheduler

0 likes · 9 min read

Custom Flink Scheduler Enhancements: Resource Balancing, Task Migration, and TmRestart Strategy

Tencent Cloud Developer

Jul 2, 2024 · Big Data

Apache Flink Deployment with Pulsar Connector: Setup, Demos, and Best Practices

This guide shows how to deploy Apache Flink 1.17 in Docker, configure off‑heap memory, connect it to Pulsar via the 4.1.0‑1.17 connector, run example jobs that copy topics and perform windowed word‑count, and provides Maven dependencies, custom serialization tips, batching settings, and version‑specific best‑practice notes.

Apache FlinkDataStreamDocker deployment

0 likes · 20 min read

Apache Flink Deployment with Pulsar Connector: Setup, Demos, and Best Practices

DataFunTalk

Dec 27, 2023 · Big Data

Apache Flink 2023: Core Technical Achievements and Future Directions

The article reviews Apache Flink's rapid development over the past decade, highlighting its 2023 community growth, SIGMOD award, major releases, streaming SQL enhancements, incremental checkpointing, batch maturity, cloud‑native scaling, and integration with the emerging Lakehouse architecture.

Apache FlinkBig DataCheckpoint

0 likes · 11 min read

Apache Flink 2023: Core Technical Achievements and Future Directions

DataFunTalk

Dec 15, 2023 · Big Data

Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0

The Flink Forward Asia 2023 conference showcased major updates to Apache Flink (versions 1.17 and 1.18), introduced the Apache Paimon lakehouse project, announced Flink CDC 3.0, and highlighted community growth, cloud‑native deployments, and real‑time data‑warehouse use cases across industry leaders.

Apache FlinkApache PaimonBig Data

0 likes · 17 min read

Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0

DataFunTalk

Dec 12, 2023 · Big Data

Flink Forward Asia 2023 Recap: Keynote Highlights, Technical Advances, and Community Updates

The Flink Forward Asia 2023 conference recap highlights opening remarks, a keynote on Flink’s dominance in streaming compute, detailed 2023 technical advancements, case studies, the launch of Flink CDC 3.0, and a preview of Flink 2.0, along with links to photos and video recordings.

Apache FlinkBig DataFlink 2.0

0 likes · 5 min read

Flink Forward Asia 2023 Recap: Keynote Highlights, Technical Advances, and Community Updates

Alibaba Cloud Big Data AI Platform

Nov 10, 2023 · Big Data

How Open‑Source Big Data 3.0 Is Redefining Real‑Time, Serverless, and AI‑Driven Analytics

The talk outlines Alibaba Cloud's open‑source big data platform evolution to version 3.0, highlighting the streaming lakehouse architecture, full serverless transformation, and AI‑enhanced operations that together enable real‑time analytics, higher performance, and smarter data management.

Apache FlinkPaimonstreaming lakehouse

0 likes · 15 min read

How Open‑Source Big Data 3.0 Is Redefining Real‑Time, Serverless, and AI‑Driven Analytics

Volcano Engine Developer Services

Aug 21, 2023 · Big Data

From Contributor to Committer: Lessons from ByteDance’s Apache Flink Journey

ByteDance’s streaming computing team members Fang Yong and Hu Weihua share their path from early Flink adopters to Apache Flink Committers, detailing their contributions to Runtime Coordinator and Streaming Warehouse, the challenges of open‑source involvement, and practical advice for developers seeking to engage with the Flink community.

Apache FlinkCommitterRuntime Coordinator

0 likes · 10 min read

From Contributor to Committer: Lessons from ByteDance’s Apache Flink Journey

WeiLi Technology Team

Aug 2, 2023 · Big Data

How to Build a Real-Time Data Warehouse: Architectures, Challenges, and Industry Practices

This article examines the growing demand for real‑time data warehouses, compares mature streaming frameworks, evaluates Lambda, Kappa and hybrid architectures, reviews industry implementations from Didi and OPPO, and proposes a standard‑layer + stream + data‑lake solution with Apache Paimon, Hudi, and Iceberg.

Apache FlinkKappa architectureLambda architecture

0 likes · 27 min read

How to Build a Real-Time Data Warehouse: Architectures, Challenges, and Industry Practices

360 Tech Engineering

Apr 10, 2023 · Big Data

Performance Tuning and Stability Analysis of Large Offline Apache Flink Jobs

This article examines how to run large offline Apache Flink jobs stably by analyzing task slot and resource configurations, CPU‑to‑slot ratios, and memory usage, offering practical recommendations to improve speed, reduce resource consumption, and avoid Hadoop‑related failures.

Apache FlinkBig DataResource Tuning

0 likes · 10 min read

Performance Tuning and Stability Analysis of Large Offline Apache Flink Jobs

Big Data Technology & Architecture

Mar 27, 2023 · Big Data

Key Updates in Apache Flink 1.17: Batch and Streaming Enhancements

The article reviews Apache Flink 1.17's major batch and streaming improvements, including new Delete/Update APIs, performance boosts, SQL client gateway, checkpoint and watermark enhancements, StateBackend upgrades, and practical use‑case scenarios for data engineers.

Apache FlinkBig DataCheckpoint

0 likes · 7 min read

Key Updates in Apache Flink 1.17: Batch and Streaming Enhancements

Baidu Geek Talk

Mar 27, 2023 · Big Data

Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse

The article details Baidu's precise watermark design for its unified streaming‑batch data warehouse, describing how a centralized watermark server and client ensure end‑to‑end data completeness, align real‑time and batch windows with 99.9‑99.99% precision, and support accurate anti‑fraud calculations within the broader big‑data ecosystem.

Apache FlinkBaiduBig Data

0 likes · 14 min read

Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse

ITPUB

Mar 24, 2023 · Big Data

What’s New in Apache Flink 1.17? Key Features, Performance Gains, and Streaming Warehouse Advances

Apache Flink 1.17 introduces a suite of batch and streaming enhancements—including a new Streaming Warehouse API, significant TPC‑DS performance boosts, adaptive batch scheduling, improved checkpointing, expanded SQL capabilities, Hive connector upgrades, and broader filesystem support—while also delivering upgrades to FRocksDB, Calcite, and the token framework to strengthen its position as a leading unified data‑processing engine.

Apache FlinkCheckpointData Warehouse

0 likes · 23 min read

What’s New in Apache Flink 1.17? Key Features, Performance Gains, and Streaming Warehouse Advances

NetEase Yanxuan Technology Product Team

Feb 27, 2023 · Big Data

How NetEase Yanxuan Migrated from Lambda to Iceberg for Real‑Time Batch‑Stream Integration

This article details how NetEase Yanxuan transformed its data platform from a dual Lambda architecture to a unified batch‑stream solution built on Apache Iceberg, covering the original challenges, the evaluation of Iceberg versus Hudi and Delta Lake, implementation of stream‑batch pipelines, message ordering fixes, snapshot generation, and extensive table‑governance optimizations.

Apache FlinkApache SparkBatch-Stream Integration

0 likes · 14 min read

How NetEase Yanxuan Migrated from Lambda to Iceberg for Real‑Time Batch‑Stream Integration

ByteDance Cloud Native

Feb 17, 2023 · Big Data

From First PR to PMC: My Journey Contributing to Apache Calcite

ByteDance engineer Li Benchao shares his ten‑month evolution from a curious newcomer to a PMC member of Apache Calcite, describing how his work on Flink SQL led to deep involvement in the open‑source community, technical growth, and mentorship.

Apache CalciteApache Flinkcommunity contribution

0 likes · 8 min read

From First PR to PMC: My Journey Contributing to Apache Calcite

StarRing Big Data Open Lab

Feb 10, 2023 · Big Data

Why Impala, Flink, and Slipstream Are Shaping Real‑Time Interactive Analytics

This article explores the evolution of real‑time computing and compares three interactive analytics engines—Impala, Apache Flink, and Slipstream—detailing their architectures, key features, deployment considerations, and why they matter for modern big‑data stream processing.

Apache FlinkImpalaSlipstream

0 likes · 13 min read

Why Impala, Flink, and Slipstream Are Shaping Real‑Time Interactive Analytics

DataFunTalk

Jan 20, 2023 · Big Data

Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework

This article introduces Flink CDC, explains its incremental snapshot algorithm and the 2.0 framework design, compares it with traditional CDC pipelines, discusses the core API and dialect concept, and outlines community growth and future plans, providing a comprehensive technical overview for data engineers.

Apache FlinkBig DataChange Data Capture

0 likes · 13 min read

Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework

vivo Internet Technology

Dec 28, 2022 · Big Data

Vivo Real-Time Computing Platform: Architecture, Practices, and Applications

The Vivo Real‑Time Computing Platform, built on Apache Flink, delivers a one‑stop data construction and governance solution that processes up to 5 PB daily, offering high‑availability submission and control services, robust stability, rich SQL usability, efficient Kubernetes deployment, strong security, and supports real‑time warehouses and short‑video recommendation, while targeting future elastic scaling and lake‑house unification.

Apache FlinkData PlatformReal-Time Computing

0 likes · 18 min read

Vivo Real-Time Computing Platform: Architecture, Practices, and Applications

Alibaba Cloud Big Data AI Platform

Nov 30, 2022 · Big Data

What’s New in Apache Flink 2022? Highlights from the Flink Forward Asia Summit

The 2022 Flink Forward Asia summit showcased Apache Flink’s rapid community growth, key technical breakthroughs such as distributed snapshot upgrades, cloud‑native state storage, hybrid shuffle, Flink CDC 2.0, and Flink ML 2.0, and real‑world deployments at companies like Midea, miHoYo and Disney.

Apache FlinkBig DataFlink Forward Asia

0 likes · 25 min read

What’s New in Apache Flink 2022? Highlights from the Flink Forward Asia Summit

DataFunTalk

Nov 29, 2022 · Big Data

Summary of Flink Forward Asia 2022: Keynotes, Technical Innovations, and Industry Deployments of Apache Flink

The 2022 Flink Forward Asia conference highlighted Apache Flink’s rapid growth, showcased major technical advances such as upgraded checkpointing, cloud‑native state storage, Hybrid Shuffle, Flink CDC 2.0, and Flink ML 2.0, and presented real‑world deployments from Alibaba, Midea, miHoYo, and Disney.

Apache FlinkData IntegrationReal-time Streaming

0 likes · 25 min read

Summary of Flink Forward Asia 2022: Keynotes, Technical Innovations, and Industry Deployments of Apache Flink

Alibaba Cloud Big Data AI Platform

Nov 29, 2022 · Big Data

How Flink’s Stream‑Batch Fusion Is Transforming Real‑Time Big Data

The article explores Apache Flink’s eight‑year journey to becoming a top‑level Apache project, Alibaba’s extensive contributions, the rise of stream‑batch unified computing, its impact on real‑time data integration, cloud‑native deployment, and the emerging Flink‑based data‑warehouse and serverless solutions.

Apache FlinkBig DataData Integration

0 likes · 15 min read

How Flink’s Stream‑Batch Fusion Is Transforming Real‑Time Big Data

Past Memory Big Data

Nov 26, 2022 · Big Data

Is Apache Flink Truly Powerful Enough After Hundreds of Engineers and Multiple Double‑11 Deployments?

The interview with Alibaba researcher Wang Feng reviews Flink's eight‑year journey to a top Apache project, its massive scale at Double 11, the push toward unified stream‑batch computing, emerging storage challenges, and the roadmap for cloud‑native, real‑time data warehousing.

Apache FlinkCDCData Integration

0 likes · 16 min read

Is Apache Flink Truly Powerful Enough After Hundreds of Engineers and Multiple Double‑11 Deployments?

Programmer DD

Nov 26, 2022 · Big Data

How Flink Became the Real‑Time Big Data Standard – Insights from Alibaba’s Wang Feng

This interview with Alibaba researcher Wang Feng (aka Mo Wen) explores Apache Flink’s eight‑year journey to top‑level Apache status, its unified stream‑batch architecture, the rise of Flink Table Store and CDC, and how cloud‑native deployments are reshaping real‑time big data processing.

Apache FlinkBig DataData Integration

0 likes · 16 min read

How Flink Became the Real‑Time Big Data Standard – Insights from Alibaba’s Wang Feng

JD Tech

Sep 6, 2022 · Big Data

Flink Streaming Job Tuning Guide: Memory Model, Network Stack, RocksDB, and More

This article presents a detailed guide for optimizing large‑scale Apache Flink streaming jobs on the JD Real‑Time Computing platform, covering TaskManager memory model tuning, network stack configuration, RocksDB state management, checkpoint strategies, and additional performance tips with practical examples and calculations.

Apache FlinkCheckpointNetwork Stack

0 likes · 22 min read

Flink Streaming Job Tuning Guide: Memory Model, Network Stack, RocksDB, and More

政采云技术

Aug 2, 2022 · Fundamentals

Understanding the Chandy‑Lamport Distributed Snapshot Algorithm

This article explains the Chandy‑Lamport algorithm for capturing consistent global snapshots in distributed systems, describes its assumptions and message‑marker rules, walks through a detailed example with three processes and channels, and relates it to Apache Flink's asynchronous checkpoint mechanism.

Apache FlinkChandy-LamportFailure Recovery

0 likes · 14 min read

Understanding the Chandy‑Lamport Distributed Snapshot Algorithm

DataFunTalk

May 19, 2022 · Big Data

SeaTunnel: Distributed Data Integration Platform and Its Application in Traffic Management

This article introduces Apache SeaTunnel, a distributed, high‑performance data integration platform built on Spark and Flink, outlines its technical features, workflow, and plugin ecosystem, and details a concrete traffic‑management use case involving incremental Oracle‑to‑warehouse data synchronization with Spark resources and scheduled shell scripts.

Apache FlinkApache SparkBig Data

0 likes · 12 min read

SeaTunnel: Distributed Data Integration Platform and Its Application in Traffic Management

Shopee Tech Team

Apr 28, 2022 · Big Data

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Shopee replaced its hourly Hive pipeline with a hybrid Flink‑Hudi real‑time data warehouse that groups Kafka topics, applies lightweight stream ETL, uses partial‑update MOR tables for multi‑stream joins and COW tables for versioned batches, cutting latency from about 90 minutes to 2–30 minutes and halving resource usage.

Apache FlinkApache HudiBig Data Architecture

0 likes · 20 min read

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Big Data Technology & Architecture

Apr 20, 2022 · Big Data

Fine‑Grained Resource Management in Apache Flink: Scenarios, Mechanism, Efficiency, Allocation Strategies, and Limitations

This article explains Apache Flink's fine‑grained resource management, describing typical use cases, the slot‑based mechanism, how it improves resource efficiency, the default allocation strategy, current limitations, and provides example code for configuring slot sharing groups.

Apache FlinkBig DataFine-Grained Resource Management

0 likes · 12 min read

Fine‑Grained Resource Management in Apache Flink: Scenarios, Mechanism, Efficiency, Allocation Strategies, and Limitations

Alibaba Cloud Developer

Apr 2, 2022 · Big Data

What’s New in Flink CDC 2.2? A Deep Dive into Added Sources and Core Features

The article introduces Flink CDC 2.2, highlighting its expanded support for twelve data sources—including OceanBase, PolarDB‑X, SqlServer, and TiDB—while detailing core features such as the incremental snapshot framework, multi‑version Flink compatibility, dynamic table addition, and numerous bug fixes and performance improvements.

Apache FlinkChange Data CaptureConnector

0 likes · 9 min read

What’s New in Flink CDC 2.2? A Deep Dive into Added Sources and Core Features

Big Data Technology & Architecture

Mar 8, 2022 · Big Data

Flink CDC 2.0: Concepts, Architecture, and Hands‑On Implementation

This article introduces the fundamentals of Flink CDC, explains its application scenarios and underlying technologies, compares query‑based and log‑based CDC, showcases open‑source solutions, and provides detailed Java and SQL examples for building real‑time ETL pipelines with MySQL and Flink.

Apache FlinkChange Data CaptureETL

0 likes · 24 min read

Flink CDC 2.0: Concepts, Architecture, and Hands‑On Implementation

Big Data Technology & Architecture

Feb 19, 2022 · Big Data

Apache Flink 1.13.6 Release: Bug Fixes, Improvements, and Updated Maven Dependencies

Apache Flink 1.13.6, the latest patch release, addresses 99 bugs and vulnerabilities, upgrades Log4j to 2.17.1, provides new Maven dependencies, and introduces numerous fixes and enhancements across SQL, checkpointing, state backend, and Kubernetes integration, urging users to upgrade promptly.

Apache FlinkBig DataBug Fixes

0 likes · 10 min read

Apache Flink 1.13.6 Release: Bug Fixes, Improvements, and Updated Maven Dependencies

DataFunTalk

Jan 25, 2022 · Big Data

Summary of Flink Forward Asia 2021: Community Growth, Cloud‑Native Deployment, Streaming‑Batch Integration, and Machine Learning

The article provides a comprehensive English summary of the 2021 Flink Forward Asia conference, covering community statistics, cloud‑native deployment modes, fault‑tolerance checkpoint advances, the evolution of streaming‑batch integration, the introduction of Streaming Warehouse, Flink ML 2.0, real‑time use cases at ByteDance and ICBC, Pravega storage innovations, and concluding reflections on the future of real‑time big data processing.

Apache FlinkBig Data

0 likes · 25 min read

Summary of Flink Forward Asia 2021: Community Growth, Cloud‑Native Deployment, Streaming‑Batch Integration, and Machine Learning

Big Data Technology & Architecture

Jan 12, 2022 · Big Data

Common Production Issues and Troubleshooting Guide for Apache Flink

This article compiles a comprehensive list of common production problems encountered with Apache Flink, covering cluster sizing, checkpoint failures, backpressure analysis, resource allocation, deployment errors, UDF definitions, data skew, Kafka configurations, and provides detailed troubleshooting steps and best‑practice recommendations.

Apache FlinkCheckpointKafka

0 likes · 39 min read

Common Production Issues and Troubleshooting Guide for Apache Flink

DataFunTalk

Jan 11, 2022 · Big Data

Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses

In an exclusive InfoQ interview, Apache Flink community leader Wang Feng (aka Mo Wen) outlines the evolution of Flink toward a Streaming Warehouse, detailing recent technical advances, use‑case scenarios, and the upcoming Dynamic Table storage that aim to unify stream and batch processing for real‑time data‑warehouse workloads.

Apache FlinkBig DataDynamic Table

0 likes · 16 min read

Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses

Programmer DD

Jan 8, 2022 · Big Data

How Flink’s Streaming Warehouse Is Redefining Real‑Time Data Lakes

This interview explores Apache Flink’s evolution toward a Streaming Warehouse, detailing its stream‑batch integration, new CDC‑based data integration, the Dynamic Table storage architecture, and how these innovations aim to simplify and accelerate real‑time big‑data analytics.

Apache FlinkBig DataDynamic Table

0 likes · 17 min read

How Flink’s Streaming Warehouse Is Redefining Real‑Time Data Lakes

Big Data Technology & Architecture

Dec 24, 2021 · Big Data

Key Updates and New Features in Apache Flink 1.14.2 Release

The Apache Flink 1.14.2 release, launched on December 16, fixes a critical Log4j vulnerability, resolves OOM issues with the Pulsar connector, introduces numerous Table API, DataStream API, connector, and checkpoint enhancements, deprecates several legacy APIs, and drops support for Apache Mesos, while also promoting related PDF resources.

Apache FlinkBig DataCheckpoints

0 likes · 8 min read

Key Updates and New Features in Apache Flink 1.14.2 Release

Tencent Cloud Developer

Nov 9, 2021 · Big Data

Comprehensive Overview of Apache Flink Streaming Computation and Architecture

The article systematically introduces Apache Flink’s streaming computation model, contrasting batch and real‑time processing, detailing its unified architecture, managed and raw state with key groups, checkpointing and savepoints for fault tolerance, data exchange mechanisms, time semantics, windowing, side‑outputs, and a complete Java Kafka‑based example.

Apache FlinkCheckpointFlink Architecture

0 likes · 46 min read

Comprehensive Overview of Apache Flink Streaming Computation and Architecture

Big Data Technology & Architecture

Oct 9, 2021 · Big Data

Apache Flink 1.7–1.14 Release Highlights and Feature Evolution

This article provides a comprehensive overview of Apache Flink's major releases from version 1.7 to 1.14, detailing new APIs, state management improvements, Kubernetes integration, SQL and Table API enhancements, checkpointing advances, and performance optimizations that together illustrate the platform's evolution for both streaming and batch processing workloads.

Apache FlinkCheckpointKubernetes

0 likes · 78 min read

Apache Flink 1.7–1.14 Release Highlights and Feature Evolution

Big Data Technology & Architecture

Sep 4, 2021 · Big Data

Understanding Time Semantics, Windows, and Process Functions in Apache Flink

This article explains how to define time characteristics, assign timestamps and watermarks, use Flink's window API, implement custom process functions, side outputs, and triggers, and handle late events, providing both Scala and Java code examples for real‑time stream processing.

Apache FlinkProcessFunctionTime Semantics

0 likes · 52 min read

Understanding Time Semantics, Windows, and Process Functions in Apache Flink

Big Data Technology & Architecture

Aug 25, 2021 · Big Data

Apache Flink Release History and Key Features from 1.7 to 1.12

This article provides a comprehensive overview of Apache Flink's major releases from version 1.7 through 1.12, detailing new functionalities such as Scala 2.12 support, state schema evolution, Blink planner integration, Kubernetes native deployment, Python (PyFlink) enhancements, and numerous performance and stability improvements for stream and batch processing.

Apache FlinkKubernetesPyFlink

0 likes · 54 min read

Apache Flink Release History and Key Features from 1.7 to 1.12

Big Data Technology Architecture

Aug 10, 2021 · Big Data

Building a Real‑Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's practical experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points of traditional Lambda architectures, Iceberg's table format and capabilities, Flink‑Iceberg sink design, small‑file handling, and future roadmap for a unified streaming‑batch data lake.

Apache FlinkApache IcebergReal-Time Data Warehouse

0 likes · 20 min read

Building a Real‑Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

dbaplus Community

Jun 5, 2021 · Big Data

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

This article explains the concept of data lakes, outlines a four‑layer open‑source architecture, presents several classic Flink‑Iceberg use cases, details why Iceberg was chosen, and describes the design of Flink’s streaming sink and upcoming community roadmap.

Apache FlinkApache IcebergBig Data

0 likes · 14 min read

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

DataFunTalk

May 4, 2021 · Big Data

Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome

This article presents the background, requirements, architectural design, component interaction, and implementation details of AutoHome's real‑time data transmission platform built on Apache Flink, highlighting its high availability, exactly‑once semantics, scalability, DDL handling, and integration with existing streaming services.

Apache FlinkBig DataData Streaming

0 likes · 18 min read

Design and Implementation of a Real-Time Data Transmission Platform Based on Apache Flink at AutoHome

DataFunTalk

Mar 28, 2021 · Big Data

Flink Stream‑Batch Integration: Layered Architecture, Unified SDK, DAG Scheduler, Shuffle, and Fault‑Tolerance

This article explains how Apache Flink has evolved into a unified stream‑batch engine by introducing a three‑layer architecture, a unified DataStream SDK, a pipeline‑region‑based DAG scheduler, a common shuffle framework, and enhanced fault‑tolerance mechanisms to address efficiency, consistency, and resource‑utilisation challenges in real‑time big‑data processing.

Apache FlinkDAG schedulerShuffle Architecture

0 likes · 25 min read

Flink Stream‑Batch Integration: Layered Architecture, Unified SDK, DAG Scheduler, Shuffle, and Fault‑Tolerance

Big Data Technology & Architecture

Feb 19, 2021 · Big Data

A Comprehensive Guide to Learning Apache Flink: Background, Core Concepts, Modules, Source Code, and Industry Applications

This article provides a detailed learning roadmap for Apache Flink, covering its theoretical background, key research papers, fundamental concepts, core modules, source‑code exploration, real‑time data‑warehouse use cases, event‑driven applications, and emerging trends in the big‑data ecosystem.

Apache FlinkState Managementevent-driven

0 likes · 9 min read

A Comprehensive Guide to Learning Apache Flink: Background, Core Concepts, Modules, Source Code, and Industry Applications

Sohu Tech Products

Feb 17, 2021 · Big Data

Dynamic Broadcast State and Data Partitioning in an Apache Flink Fraud Detection Engine

This article demonstrates how to initialize, broadcast, and dynamically update rule sets in an Apache Flink fraud detection pipeline, using BroadcastProcessFunction and MapState to achieve runtime data partitioning without recompiling, and explains the underlying data exchange patterns such as forward, hash, rebalance, and broadcast.

Apache FlinkBroadcast StateDynamic Key Function

0 likes · 11 min read

Dynamic Broadcast State and Data Partitioning in an Apache Flink Fraud Detection Engine

Sohu Tech Products

Feb 17, 2021 · Big Data

Dynamic Data Partitioning in Apache Flink: A Fraud Detection Demo

This article explains how to implement dynamic data partitioning in Apache Flink using a fraud‑detection demo, covering the system architecture, rule‑driven runtime reconfiguration, custom ProcessFunction code, and the underlying key‑by logic that enables flexible, real‑time stream processing.

Apache FlinkDynamic PartitioningKeyBy

0 likes · 11 min read

Dynamic Data Partitioning in Apache Flink: A Fraud Detection Demo

DataFunTalk

Feb 12, 2021 · Big Data

Apache Flink at Kuaishou: Past, Present, and Future

Zhao Jianbo, head of Kuaishou's big data architecture team, presents an in‑depth overview of Apache Flink's adoption at Kuaishou, covering reasons for selection, development history, business data flows, technical innovations such as the Slimbase state engine, stability improvements, and future roadmap.

Apache FlinkBig DataKuaishou

0 likes · 16 min read

Apache Flink at Kuaishou: Past, Present, and Future

DataFunTalk

Feb 1, 2021 · Big Data

Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points, Iceberg's table format and capabilities, Flink‑Iceberg streaming and batch processing, practical implementations, and future roadmap for data‑lake acceleration.

Apache FlinkApache IcebergBig Data

0 likes · 21 min read

Big Data Technology & Architecture

Feb 1, 2021 · Big Data

Deploying Apache Flink 1.12 on Kubernetes: High‑Availability Architecture and DataStream Batch Execution

This article explains how Flink 1.12 introduces production‑grade Kubernetes high‑availability, details the underlying architecture and deployment steps, and shows how the DataStream API can run in batch mode using runtime‑mode configuration and example commands.

Apache FlinkDataStreamHigh Availability

0 likes · 12 min read

Deploying Apache Flink 1.12 on Kubernetes: High‑Availability Architecture and DataStream Batch Execution

Alibaba Cloud Developer

Jan 25, 2021 · Big Data

Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem

In 2020, Apache Flink surged to become the most active Apache project, releasing three major versions that advanced its unified stream‑batch engine, introduced cloud‑native K8s support, expanded AI capabilities with PyFlink, and fostered a thriving Chinese community, solidifying its role as the de‑facto standard for real‑time computing.

AI integrationApache FlinkBig Data

0 likes · 21 min read

Why 2020 Was the Breakthrough Year for Apache Flink’s Ecosystem

DataFunTalk

Jan 22, 2021 · Big Data

Practical Experience of Apache Flink at ByteDance: Architecture, Optimizations, and Future Directions

This article presents ByteDance's real‑world use of Apache Flink, covering the platform's overall architecture, SQL extensions, custom connectors, UI‑driven SQL platform, performance optimizations such as window mini‑batch and custom windows, dimension‑table enhancements, checkpoint recovery improvements, stream‑batch integration, and upcoming roadmap items.

Apache FlinkBig DataByteDance

0 likes · 15 min read

Practical Experience of Apache Flink at ByteDance: Architecture, Optimizations, and Future Directions

Byte Quality Assurance Team

Jan 6, 2021 · Big Data

Fundamentals of Stream Processing: Bounded vs. Unbounded Data, Time Domains, and Windowing Strategies

This article provides a comprehensive introduction to stream processing fundamentals by distinguishing between bounded and unbounded datasets, clarifying the critical differences between event time and processing time, and exploring various windowing strategies to demonstrate how modern distributed systems efficiently handle continuous data flows.

Apache FlinkData WindowingEvent Time

0 likes · 13 min read

Fundamentals of Stream Processing: Bounded vs. Unbounded Data, Time Domains, and Windowing Strategies

DataFunTalk

Jan 5, 2021 · Big Data

Highlights of Flink Forward Asia 2020: Stream‑Batch Integration, AI Fusion, and Cloud‑Native Advances

The 2020 Flink Forward Asia conference showcased Apache Flink's rapid growth, community milestones, industry adoption, and technical breakthroughs such as unaligned checkpoints, approximate failover, the Nexmark benchmark, stream‑batch unification, AI integration via PyFlink and Alink, and deep cloud‑native support on Kubernetes, illustrated through case studies from Alibaba, Meituan, Kuaishou, and Dell.

AI integrationApache Flinkcloud-native

0 likes · 20 min read

Highlights of Flink Forward Asia 2020: Stream‑Batch Integration, AI Fusion, and Cloud‑Native Advances

Alibaba Terminal Technology

Dec 23, 2020 · Frontend Development

Why Reactive Programming Beats MVVM and Redux in Frontend and Real‑Time Computing

This article explains the essence of frontend development as view‑reaction to events, compares reactive programming with MVVM and Redux, demonstrates a complete RxJS implementation for a news app, and shows how the same data‑flow concepts extend to real‑time computing with Apache Flink.

Apache FlinkMVVMReactive Programming

0 likes · 14 min read

Why Reactive Programming Beats MVVM and Redux in Frontend and Real‑Time Computing

DataFunTalk

Dec 11, 2020 · Big Data

My Journey and Contributions in the Apache Flink Community

The author shares his personal journey from first encountering Flink to becoming an Apache Flink Committer at ByteDance, detailing community involvement, code contributions, bug fixes, lessons learned, advice for newcomers, and concluding with promotional offers for Flink services.

Apache FlinkBlink PlannerSQL

0 likes · 12 min read

My Journey and Contributions in the Apache Flink Community

Big Data Technology Architecture

Nov 27, 2020 · Big Data

Integrating Apache Flink with Data Lakes Using Apache Iceberg: Architecture, Use Cases, and Future Roadmap

This article explains how Apache Flink combines with Apache Iceberg to build unified stream‑batch data lake solutions, covering data lake fundamentals, architectural layers, classic business scenarios, reasons for choosing Iceberg, streaming ingestion design, and upcoming community enhancements.

Apache FlinkApache Icebergtable format

0 likes · 13 min read

Integrating Apache Flink with Data Lakes Using Apache Iceberg: Architecture, Use Cases, and Future Roadmap

DataFunTalk

Oct 25, 2020 · Big Data

Bilibili's Saber Real-Time Computing Platform: Architecture, Challenges, and AI Integration

Zheng Zhisheng from Bilibili presents the Saber real-time computing platform, detailing its pain points, evolution, Apache Flink‑based architecture, SQL‑centric BSQL programming, DAG drag‑and‑drop design, AI use cases, and future development plans to improve scalability, operability, and AI integration.

AI integrationApache FlinkBSQL

0 likes · 19 min read

Bilibili's Saber Real-Time Computing Platform: Architecture, Challenges, and AI Integration

Big Data Technology & Architecture

Oct 13, 2020 · Big Data

Understanding Stateful Functions: API, Runtime, and Stream Processing with Apache Flink

This article explains the open‑source Stateful Functions framework, its API and Flink‑based runtime, and how it simplifies building distributed stateful applications by combining serverless concepts with robust state management for event‑driven architectures.

Apache FlinkBig DataEvent-Driven Architecture

0 likes · 8 min read

Understanding Stateful Functions: API, Runtime, and Stream Processing with Apache Flink

Big Data Technology & Architecture

Aug 3, 2020 · Big Data

Custom Count Trigger with Timeout for Apache Flink Windows

This article explains how to create a custom Apache Flink trigger that fires a window either when a specified element count is reached or when a time limit expires, includes the full Java implementation and a usage example with a 10‑second timeout and a 1000‑element threshold.

Apache FlinkCount WindowCustom Trigger

0 likes · 5 min read

Custom Count Trigger with Timeout for Apache Flink Windows

DataFunTalk

Jul 22, 2020 · Big Data

Building a Real-Time Computing Platform with Apache Flink at iQIYI: Architecture, Improvements, and Business Cases

iQIYI’s senior data engineer shares the evolution of its big‑data services from Hadoop to a Flink‑based real‑time computing platform, detailing architecture, monitoring improvements, StreamingSQL capabilities, business use cases like recommendation and deep‑learning data generation, and future plans for unified stream‑batch processing.

Apache FlinkData PlatformFlink

0 likes · 11 min read

Building a Real-Time Computing Platform with Apache Flink at iQIYI: Architecture, Improvements, and Business Cases

Alibaba Cloud Developer

Jul 13, 2020 · Big Data

What’s New in Apache Flink 1.11? A Deep Dive into Features and Performance

Apache Flink 1.11.0, released after four months of development, brings major ecosystem, usability, and stability improvements—including CDC support, a new JDBC catalog, real‑time Hive integration, a redesigned source API, PyFlink enhancements, application mode for Kubernetes, and checkpoint optimizations—while highlighting the growing contribution of Chinese developers.

Apache FlinkCheckpointFeature Release

0 likes · 20 min read

What’s New in Apache Flink 1.11? A Deep Dive into Features and Performance

DataFunTalk

Jul 13, 2020 · Big Data

Design and Challenges of Netflix’s Keystone Real‑Time Data Platform

This article first outlines Netflix’s Keystone real‑time data platform, describing its background, functionalities, and the distributed‑system challenges and solutions such as ordering semantics and processing contracts, then shifts to announce the second‑edition Apache Flink Geek Challenge, detailing its theme, schedule, prizes, and registration instructions.

Apache Flinkbig data competitionreal-time data platform

0 likes · 18 min read

Design and Challenges of Netflix’s Keystone Real‑Time Data Platform

Big Data Technology Architecture

Jun 29, 2020 · Big Data

Real‑time Data Warehouse Construction: Goals, Architecture, and Best Practices with Apache Flink

This article summarizes the objectives, design principles, application scenarios, layer‑by‑layer construction methods, quality assurance mechanisms, and supporting tools for building a real‑time data warehouse using Apache Flink, providing practical guidance for data engineers and architects.

Apache FlinkData EngineeringData Quality

0 likes · 24 min read

Architect

Jun 11, 2020 · Big Data

Understanding Apache Flink Architecture, Data Transfer, Event‑Time Processing, State Management, and Checkpointing

This article explains Apache Flink's distributed system architecture—including JobManager, ResourceManager, TaskManager, and Dispatcher—covers session and job deployment modes, data transfer mechanisms, event‑time handling with watermarks, various state types and backends, scaling strategies, and the checkpoint/savepoint recovery process.

Apache FlinkBig DataEvent Time

0 likes · 15 min read

Understanding Apache Flink Architecture, Data Transfer, Event‑Time Processing, State Management, and Checkpointing

Big Data Technology Architecture

May 22, 2020 · Big Data

Apache Flink 1.11 New Features Overview

The article provides a comprehensive overview of Apache Flink 1.11, detailing enhancements in cluster deployment, resource management, source/sink APIs, state backends, Table & SQL improvements, DataStream extensions, PyFlink/ML support, and runtime optimizations, along with relevant code examples and references.

Apache FlinkFlink 1.11Table API

0 likes · 19 min read

Didi Tech

Apr 30, 2020 · Big Data

Didi’s Real‑Time Computing Practices with Apache Flink and StreamSQL

Didi has unified its real‑time computing on Apache Flink, creating an enhanced StreamSQL service with extended DDL, built‑in parsers and UDX, supporting thousands of nodes, millions of jobs, and trillions of daily records, while addressing state management, high availability, multi‑language UDFs, and pursuing real‑time ML and data‑warehouse integration.

Apache FlinkBig DataDidi

0 likes · 13 min read

Didi’s Real‑Time Computing Practices with Apache Flink and StreamSQL

DataFunTalk

Apr 22, 2020 · Big Data

Didi's Real-Time Computing Practices with Apache Flink: Architecture, StreamSQL, and Operational Insights

Senior Didi technology expert Liang Li-yin shares how Didi leverages Apache Flink for large‑scale real‑time computing, covering service architecture, StreamSQL advantages, multi‑cluster management, task control, monitoring, meta‑store integration, challenges, and future plans such as high availability, real‑time ML, and unified batch‑stream processing.

Apache FlinkBig DataData Engineering

0 likes · 14 min read

Didi's Real-Time Computing Practices with Apache Flink: Architecture, StreamSQL, and Operational Insights

DataFunTalk

Apr 15, 2020 · Big Data

Apache Flink OLAP Engine: Architecture, Optimizations, and Use Cases

This article presents an in‑depth overview of Apache Flink's new OLAP engine, covering OLAP fundamentals, the three OLAP models, Flink's unified streaming‑batch‑OLAP architecture, performance optimizations, benchmark results, and future development directions.

Apache FlinkBig DataOLAP

0 likes · 11 min read

Apache Flink OLAP Engine: Architecture, Optimizations, and Use Cases

HomeTech

Mar 11, 2020 · Big Data

Streaming SQL with Apache Flink: Theory, Platform Optimizations, and Real‑Time Use Cases

This article introduces Apache Flink's Streaming SQL, explains its theoretical foundations such as the table‑stream relationship and watermark semantics, describes the platform's practical enhancements—including source/sink wrappers, built‑in functions, and native Retract Stream support—and showcases several real‑time computation examples.

Apache FlinkDataStreamReal-Time Computing

0 likes · 31 min read

Streaming SQL with Apache Flink: Theory, Platform Optimizations, and Real‑Time Use Cases

Architecture Digest

Mar 11, 2020 · Big Data

Apache Flink: Unified Stream and Batch Processing Architecture and Core Concepts

This article provides a comprehensive overview of Apache Flink, explaining how it unifies stream and batch processing on a single runtime, detailing its key features, APIs, libraries, architectural components, fault‑tolerance mechanisms, scheduling, iterative processing, and back‑pressure monitoring.

Apache FlinkDistributed Computingbackpressure

0 likes · 20 min read

Apache Flink: Unified Stream and Batch Processing Architecture and Core Concepts

DataFunTalk

Mar 6, 2020 · Artificial Intelligence

Advances in Apache Flink AI Ecosystem: ML Pipeline, AI Flow, and Mini‑Batch Streaming Iteration

This article reviews recent progress in Apache Flink's AI ecosystem, explaining how Flink unifies batch and stream processing for machine‑learning pipelines, introduces the Flink ML Pipeline and Alink library, describes the AI Flow framework for end‑to‑end ML workflows, and presents a novel mini‑batch streaming iteration mechanism to support both offline and online learning scenarios.

AI FlowApache FlinkMini-batch Iteration

0 likes · 13 min read

Advances in Apache Flink AI Ecosystem: ML Pipeline, AI Flow, and Mini‑Batch Streaming Iteration

Youzan Coder

Feb 28, 2020 · Big Data

Flink Checkpoint Principle Analysis and Failure Cause Investigation

The article thoroughly explains Apache Flink’s checkpoint mechanism—including state types, coordinator workflow, exactly‑once versus at‑least‑once semantics, common failure sources such as code exceptions, storage or network issues, and practical configuration tips like interval settings, local recovery and externalized checkpoints.

Apache FlinkCheckpointExactly-once

0 likes · 15 min read

Flink Checkpoint Principle Analysis and Failure Cause Investigation

Beike Product & Technology

Feb 27, 2020 · Big Data

Real‑Time Computing with Apache Flink at Beike Zhaofang: Hermes Platform Overview and Future Plans

This article presents the evolution, architecture, and operational metrics of Beike Zhaofang's Hermes real‑time computing platform built on Apache Flink, detailing its business scale, SQL editors, task growth, monitoring, use cases, and future development directions.

Apache FlinkBig DataData Engineering

0 likes · 10 min read

Real‑Time Computing with Apache Flink at Beike Zhaofang: Hermes Platform Overview and Future Plans

dbaplus Community

Feb 25, 2020 · Backend Development

How to Merge Small Files in Flink Checkpoints to Reduce HDFS Load

This article explains a small‑file‑merging technique for Apache Flink checkpoints that reuses FSDataOutputStreams to combine multiple state files into a single HDFS file, detailing design considerations such as concurrent checkpoint support, reference‑counted deletion, space amplification reduction, fault handling, compatibility, and observed production performance gains.

Apache FlinkCheckpointHDFS

0 likes · 13 min read

How to Merge Small Files in Flink Checkpoints to Reduce HDFS Load

Alibaba Cloud Developer

Feb 24, 2020 · Big Data

What’s New in Apache Flink 1.10? Deep Dive into Major Features and Enhancements

Apache Flink 1.10 introduces a major upgrade that merges the Blink engine, boosts performance and stability, adds native Kubernetes support, enhances SQL DDL, delivers production‑ready Hive batch compatibility, optimizes memory management, and expands Python UDF capabilities, with detailed feature breakdowns and code examples.

Apache FlinkKubernetesSQL DDL

0 likes · 8 min read

What’s New in Apache Flink 1.10? Deep Dive into Major Features and Enhancements

Big Data Technology & Architecture

Feb 15, 2020 · Big Data

Understanding Event Time and Watermarks in Apache Flink

This article explains how Apache Flink uses event‑time timestamps and watermarks to handle out‑of‑order and late data, describes the assignTimestampsAndWatermarks API with periodic and punctuated watermark assigners, and provides practical code examples for window lateness and side‑output handling.

Apache FlinkEvent TimeFlink

0 likes · 10 min read

Understanding Event Time and Watermarks in Apache Flink

Big Data Technology Architecture

Feb 12, 2020 · Big Data

Apache Flink 1.10 Release: New Features, Optimizations, and Kubernetes Integration

Apache Flink 1.10 introduces major performance and stability improvements, unified memory configuration, native Kubernetes session mode, enhanced Table API/SQL with production‑ready Hive integration, expanded Python UDF support, and a host of important bug fixes and connector updates, marking the largest community‑driven release to date.

Apache FlinkHive IntegrationKubernetes

0 likes · 17 min read

Apache Flink 1.10 Release: New Features, Optimizations, and Kubernetes Integration

Big Data Technology Architecture

Feb 11, 2020 · Big Data

Building Bilibili's Real-Time Streaming Platform with Apache Flink and AI

The presentation by Bilibili's real‑time platform lead details the design and implementation of a Flink‑based streaming data platform, explains how AI workloads are integrated, shares architectural decisions and operational insights, and provides the full slide deck for knowledge dissemination.

AI integrationApache FlinkBilibili

0 likes · 2 min read

Building Bilibili's Real-Time Streaming Platform with Apache Flink and AI

Big Data Technology Architecture

Feb 1, 2020 · Big Data

Beike's Hermes Real‑Time Computing Platform: Architecture, Scale, and Future Roadmap

The article presents a comprehensive case study of Beike's Hermes real‑time computing platform, detailing its business evolution, Hermes architecture, SQL V1/V2 editors built on Spark and Flink, large‑scale deployment statistics, monitoring, diverse business use cases, and planned future enhancements.

Apache FlinkBeikeBig Data

0 likes · 11 min read

Beike's Hermes Real‑Time Computing Platform: Architecture, Scale, and Future Roadmap

Alibaba Cloud Developer

Dec 16, 2019 · Big Data

Why Apache Flink Became the Fastest‑Growing Open‑Source Big Data Engine in 2019

Apache Flink, the open‑source stream‑and‑batch processing engine, has surged to become one of the most active Apache projects, with rapid community growth in China, unified SQL capabilities, AI‑focused extensions, Kubernetes integration, and benchmark results that outperform Hive by up to seven times.

AIApache FlinkBig Data

0 likes · 14 min read

Why Apache Flink Became the Fastest‑Growing Open‑Source Big Data Engine in 2019

Big Data Technology & Architecture

Dec 9, 2019 · Big Data

Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees

This article explains how to develop a real‑time ETL application using Apache Flink that reads events from Kafka, partitions them by event time into HDFS directories, and achieves exactly‑once processing through checkpointing, custom bucket assigners, and proper state backend configuration.

Apache FlinkBig DataExactly-once

0 likes · 11 min read

Building a Real‑Time ETL Pipeline with Apache Flink: Kafka to HDFS with Exactly‑Once Guarantees

Alibaba Cloud Developer

Dec 5, 2019 · Big Data

What I Learned at Flink Forward Asia 2019: Stream Processing, AI, and Cloud‑Native Insights

The three‑day Flink Forward Asia 2019 conference in Beijing attracted over 2,000 attendees, showcased more than 45 talks from leading companies and researchers, and highlighted the evolution of Flink toward a unified engine, Stateful Functions, AI integration, cloud‑native deployment, and real‑time analytics at massive scale.

Apache FlinkArtificial IntelligenceStateful Functions

0 likes · 16 min read

What I Learned at Flink Forward Asia 2019: Stream Processing, AI, and Cloud‑Native Insights