Tag

Apache Flink

0 views collected around this technical thread.

DeWu Technology
DeWu Technology
Jul 31, 2024 · Big Data

Custom Flink Scheduler Enhancements: Resource Balancing, Task Migration, and TmRestart Strategy

The article details Dewu’s custom Flink scheduler, DwScheduler, which adds JSON‑based resource specifications, per‑TaskManager slot sharing for balanced CPU use, hot TaskManager migration callbacks, and a new TmRestart strategy for rapid pod‑process recovery, offering practical techniques to enhance real‑time stream processing stability and performance.

Apache FlinkSchedulerStreaming
0 likes · 9 min read
Custom Flink Scheduler Enhancements: Resource Balancing, Task Migration, and TmRestart Strategy
Tencent Cloud Developer
Tencent Cloud Developer
Jul 2, 2024 · Big Data

Apache Flink Deployment with Pulsar Connector: Setup, Demos, and Best Practices

This guide shows how to deploy Apache Flink 1.17 in Docker, configure off‑heap memory, connect it to Pulsar via the 4.1.0‑1.17 connector, run example jobs that copy topics and perform windowed word‑count, and provides Maven dependencies, custom serialization tips, batching settings, and version‑specific best‑practice notes.

Apache FlinkDataStreamDocker Deployment
0 likes · 20 min read
Apache Flink Deployment with Pulsar Connector: Setup, Demos, and Best Practices
DataFunTalk
DataFunTalk
Dec 27, 2023 · Big Data

Apache Flink 2023: Core Technical Achievements and Future Directions

The article reviews Apache Flink's rapid development over the past decade, highlighting its 2023 community growth, SIGMOD award, major releases, streaming SQL enhancements, incremental checkpointing, batch maturity, cloud‑native scaling, and integration with the emerging Lakehouse architecture.

Apache FlinkBig DataCheckpoint
0 likes · 11 min read
Apache Flink 2023: Core Technical Achievements and Future Directions
DataFunTalk
DataFunTalk
Dec 15, 2023 · Big Data

Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0

The Flink Forward Asia 2023 conference showcased major updates to Apache Flink (versions 1.17 and 1.18), introduced the Apache Paimon lakehouse project, announced Flink CDC 3.0, and highlighted community growth, cloud‑native deployments, and real‑time data‑warehouse use cases across industry leaders.

Apache FlinkApache PaimonBig Data
0 likes · 17 min read
Flink Forward Asia 2023: New Flink Releases, Apache Paimon, and Flink CDC 3.0
DataFunTalk
DataFunTalk
Dec 12, 2023 · Big Data

Flink Forward Asia 2023 Recap: Keynote Highlights, Technical Advances, and Community Updates

The Flink Forward Asia 2023 conference recap highlights opening remarks, a keynote on Flink’s dominance in streaming compute, detailed 2023 technical advancements, case studies, the launch of Flink CDC 3.0, and a preview of Flink 2.0, along with links to photos and video recordings.

Apache FlinkBig DataFlink 2.0
0 likes · 5 min read
Flink Forward Asia 2023 Recap: Keynote Highlights, Technical Advances, and Community Updates
WeiLi Technology Team
WeiLi Technology Team
Aug 2, 2023 · Big Data

How to Build a Real-Time Data Warehouse: Architectures, Challenges, and Industry Practices

This article examines the growing demand for real‑time data warehouses, compares mature streaming frameworks, evaluates Lambda, Kappa and hybrid architectures, reviews industry implementations from Didi and OPPO, and proposes a standard‑layer + stream + data‑lake solution with Apache Paimon, Hudi, and Iceberg.

Apache FlinkBig DataKappa architecture
0 likes · 27 min read
How to Build a Real-Time Data Warehouse: Architectures, Challenges, and Industry Practices
360 Tech Engineering
360 Tech Engineering
Apr 10, 2023 · Big Data

Performance Tuning and Stability Analysis of Large Offline Apache Flink Jobs

This article examines how to run large offline Apache Flink jobs stably by analyzing task slot and resource configurations, CPU‑to‑slot ratios, and memory usage, offering practical recommendations to improve speed, reduce resource consumption, and avoid Hadoop‑related failures.

Apache FlinkBig DataResource Tuning
0 likes · 10 min read
Performance Tuning and Stability Analysis of Large Offline Apache Flink Jobs
Baidu Geek Talk
Baidu Geek Talk
Mar 27, 2023 · Big Data

Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse

The article details Baidu's precise watermark design for its unified streaming‑batch data warehouse, describing how a centralized watermark server and client ensure end‑to‑end data completeness, align real‑time and batch windows with 99.9‑99.99% precision, and support accurate anti‑fraud calculations within the broader big‑data ecosystem.

Apache FlinkBaiduBig Data
0 likes · 14 min read
Precise Watermark Design and Implementation in Baidu's Unified Streaming-Batch Data Warehouse
ByteDance Cloud Native
ByteDance Cloud Native
Feb 17, 2023 · Big Data

From First PR to PMC: My Journey Contributing to Apache Calcite

ByteDance engineer Li Benchao shares his ten‑month evolution from a curious newcomer to a PMC member of Apache Calcite, describing how his work on Flink SQL led to deep involvement in the open‑source community, technical growth, and mentorship.

Apache CalciteApache FlinkCommunity Contribution
0 likes · 8 min read
From First PR to PMC: My Journey Contributing to Apache Calcite
DataFunTalk
DataFunTalk
Jan 20, 2023 · Big Data

Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework

This article introduces Flink CDC, explains its incremental snapshot algorithm and the 2.0 framework design, compares it with traditional CDC pipelines, discusses the core API and dialect concept, and outlines community growth and future plans, providing a comprehensive technical overview for data engineers.

Apache FlinkBig DataChange Data Capture
0 likes · 13 min read
Introduction to Flink CDC: Incremental Snapshot Algorithm and Framework
vivo Internet Technology
vivo Internet Technology
Dec 28, 2022 · Big Data

Vivo Real-Time Computing Platform: Architecture, Practices, and Applications

The Vivo Real‑Time Computing Platform, built on Apache Flink, delivers a one‑stop data construction and governance solution that processes up to 5 PB daily, offering high‑availability submission and control services, robust stability, rich SQL usability, efficient Kubernetes deployment, strong security, and supports real‑time warehouses and short‑video recommendation, while targeting future elastic scaling and lake‑house unification.

Apache FlinkBig Datacloud-native
0 likes · 18 min read
Vivo Real-Time Computing Platform: Architecture, Practices, and Applications
DataFunTalk
DataFunTalk
Nov 29, 2022 · Big Data

Summary of Flink Forward Asia 2022: Keynotes, Technical Innovations, and Industry Deployments of Apache Flink

The 2022 Flink Forward Asia conference highlighted Apache Flink’s rapid growth, showcased major technical advances such as upgraded checkpointing, cloud‑native state storage, Hybrid Shuffle, Flink CDC 2.0, and Flink ML 2.0, and presented real‑world deployments from Alibaba, Midea, miHoYo, and Disney.

Apache FlinkBig DataReal-time Streaming
0 likes · 25 min read
Summary of Flink Forward Asia 2022: Keynotes, Technical Innovations, and Industry Deployments of Apache Flink
JD Tech
JD Tech
Sep 6, 2022 · Big Data

Flink Streaming Job Tuning Guide: Memory Model, Network Stack, RocksDB, and More

This article presents a detailed guide for optimizing large‑scale Apache Flink streaming jobs on the JD Real‑Time Computing platform, covering TaskManager memory model tuning, network stack configuration, RocksDB state management, checkpoint strategies, and additional performance tips with practical examples and calculations.

Apache FlinkCheckpointPerformance Tuning
0 likes · 22 min read
Flink Streaming Job Tuning Guide: Memory Model, Network Stack, RocksDB, and More
政采云技术
政采云技术
Aug 2, 2022 · Fundamentals

Understanding the Chandy‑Lamport Distributed Snapshot Algorithm

This article explains the Chandy‑Lamport algorithm for capturing consistent global snapshots in distributed systems, describes its assumptions and message‑marker rules, walks through a detailed example with three processes and channels, and relates it to Apache Flink's asynchronous checkpoint mechanism.

Apache FlinkChandy-LamportFailure Recovery
0 likes · 14 min read
Understanding the Chandy‑Lamport Distributed Snapshot Algorithm
DataFunTalk
DataFunTalk
May 19, 2022 · Big Data

SeaTunnel: Distributed Data Integration Platform and Its Application in Traffic Management

This article introduces Apache SeaTunnel, a distributed, high‑performance data integration platform built on Spark and Flink, outlines its technical features, workflow, and plugin ecosystem, and details a concrete traffic‑management use case involving incremental Oracle‑to‑warehouse data synchronization with Spark resources and scheduled shell scripts.

Apache FlinkApache SparkBig Data
0 likes · 12 min read
SeaTunnel: Distributed Data Integration Platform and Its Application in Traffic Management
Shopee Tech Team
Shopee Tech Team
Apr 28, 2022 · Big Data

Building Real-Time Data Warehouse with Flink + Hudi at Shopee

Shopee replaced its hourly Hive pipeline with a hybrid Flink‑Hudi real‑time data warehouse that groups Kafka topics, applies lightweight stream ETL, uses partial‑update MOR tables for multi‑stream joins and COW tables for versioned batches, cutting latency from about 90 minutes to 2–30 minutes and halving resource usage.

Apache FlinkApache HudiData Lakehouse
0 likes · 20 min read
Building Real-Time Data Warehouse with Flink + Hudi at Shopee
DataFunTalk
DataFunTalk
Jan 25, 2022 · Big Data

Summary of Flink Forward Asia 2021: Community Growth, Cloud‑Native Deployment, Streaming‑Batch Integration, and Machine Learning

The article provides a comprehensive English summary of the 2021 Flink Forward Asia conference, covering community statistics, cloud‑native deployment modes, fault‑tolerance checkpoint advances, the evolution of streaming‑batch integration, the introduction of Streaming Warehouse, Flink ML 2.0, real‑time use cases at ByteDance and ICBC, Pravega storage innovations, and concluding reflections on the future of real‑time big data processing.

Apache FlinkBig DataStreaming
0 likes · 25 min read
Summary of Flink Forward Asia 2021: Community Growth, Cloud‑Native Deployment, Streaming‑Batch Integration, and Machine Learning
DataFunTalk
DataFunTalk
Jan 11, 2022 · Big Data

Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses

In an exclusive InfoQ interview, Apache Flink community leader Wang Feng (aka Mo Wen) outlines the evolution of Flink toward a Streaming Warehouse, detailing recent technical advances, use‑case scenarios, and the upcoming Dynamic Table storage that aim to unify stream and batch processing for real‑time data‑warehouse workloads.

Apache FlinkBig DataDynamic Table
0 likes · 16 min read
Interview with Wang Feng (Mo Wen): The Future of Apache Flink and Streaming Warehouses
Tencent Cloud Developer
Tencent Cloud Developer
Nov 9, 2021 · Big Data

Comprehensive Overview of Apache Flink Streaming Computation and Architecture

The article systematically introduces Apache Flink’s streaming computation model, contrasting batch and real‑time processing, detailing its unified architecture, managed and raw state with key groups, checkpointing and savepoints for fault tolerance, data exchange mechanisms, time semantics, windowing, side‑outputs, and a complete Java Kafka‑based example.

Apache FlinkCheckpointFlink Architecture
0 likes · 46 min read
Comprehensive Overview of Apache Flink Streaming Computation and Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Aug 10, 2021 · Big Data

Building a Real‑Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's practical experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points of traditional Lambda architectures, Iceberg's table format and capabilities, Flink‑Iceberg sink design, small‑file handling, and future roadmap for a unified streaming‑batch data lake.

Apache FlinkApache IcebergBig Data
0 likes · 20 min read
Building a Real‑Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices