Tagged articles

Apache Flink

142 articles · Page 2 of 2

Sep 19, 2019 · Big Data

Building a Real‑Time ETL Pipeline with Apache Flink and Ensuring Exactly‑once Semantics

This article demonstrates how to develop a real‑time ETL job using Apache Flink, covering project setup, Kafka as a source, custom bucket assigners for HDFS, checkpointing, savepoints, and deployment on YARN to achieve exactly‑once processing guarantees.

Apache FlinkBig DataExactly-once

0 likes · 11 min read

Building a Real‑Time ETL Pipeline with Apache Flink and Ensuring Exactly‑once Semantics

Big Data Technology & Architecture

Sep 16, 2019 · Big Data

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

This guide provides a detailed overview of Apache Flink covering its core streaming engine, APIs (DataSet, DataStream, Table), architectural components, comparison with Spark Streaming, partitioning, parallelism, restart strategies, state backends, time semantics, watermarks, SQL processing, fault‑tolerance mechanisms, memory management, serialization, RPC framework, back‑pressure handling, operator chaining, and practical tips for interview preparation.

Apache FlinkBig DataDataFlow

0 likes · 22 min read

Comprehensive Flink Interview Guide: Architecture, APIs, Operators, and Advanced Topics

Big Data Technology & Architecture

Sep 15, 2019 · Big Data

Flink Interview Guide: Concepts, Basics, Advanced Topics, and Source Code

This article presents a comprehensive collection of Flink interview questions covering fundamental concepts, advanced topics, and source‑code details to help candidates prepare effectively for Flink‑related technical interviews.

Apache FlinkBig DataFlink

0 likes · 6 min read

Flink Interview Guide: Concepts, Basics, Advanced Topics, and Source Code

Big Data Technology & Architecture

Aug 26, 2019 · Big Data

Comprehensive Collection of Apache Flink Learning Resources

This article compiles a curated list of the most reliable and official Apache Flink learning materials—including beginner tutorials, source‑code walkthroughs, advanced topics, community articles, real‑world case studies, and downloadable resources—providing a one‑stop reference for developers and researchers interested in stream processing and big‑data analytics.

Apache FlinkBig DataData Engineering

0 likes · 10 min read

Comprehensive Collection of Apache Flink Learning Resources

Alibaba Cloud Developer

Aug 23, 2019 · Big Data

What’s New in Apache Flink 1.9.0? Deep Dive into Architecture, Table API & Hive Integration

Apache Flink 1.9.0, released on August 22, merges Alibaba's Blink engine, introduces a major architecture overhaul, enriches Table API & SQL, adds batch and stream processing enhancements, and integrates tightly with Hive, marking a significant milestone for large‑scale data processing.

Apache FlinkHive IntegrationTable API

0 likes · 14 min read

What’s New in Apache Flink 1.9.0? Deep Dive into Architecture, Table API & Hive Integration

Big Data Technology & Architecture

Jul 22, 2019 · Big Data

Deep Dive into Stream SQL Principles and Incremental Query Execution in Apache Flink

This article provides an in‑depth analysis of Stream SQL theory, incremental query algorithms, materialized view maintenance, optimizer cost models, time handling, windowing, and the practical capabilities and limitations of Apache Flink’s streaming SQL engine.

Apache FlinkIncremental QueryStream SQL

0 likes · 30 min read

Deep Dive into Stream SQL Principles and Incremental Query Execution in Apache Flink

Big Data Technology & Architecture

Jul 20, 2019 · Big Data

Registering UDF, UDTF, and UDAF Functions in Apache Flink – Common Pitfalls and Solutions

This article explains how to register scalar UDFs, table‑valued UDTFs, and aggregate UDAFs in Apache Flink, illustrates typical compilation and runtime pitfalls with concrete Scala code examples, and provides corrected implementations and best‑practice tips for reliable function registration.

Apache FlinkBig DataScala

0 likes · 13 min read

Registering UDF, UDTF, and UDAF Functions in Apache Flink – Common Pitfalls and Solutions

NetEase Game Operations Platform

Jul 13, 2019 · Big Data

Understanding Watermarks in Real-Time Stream Processing with Apache Flink

This article explains the concept of Watermarks in stream processing, detailing their background, theoretical foundations from the Dataflow model, practical implementation in Apache Flink with code examples, and discusses trade‑offs between latency and accuracy for real‑time analytics.

Apache FlinkEvent Timewatermark

0 likes · 16 min read

Understanding Watermarks in Real-Time Stream Processing with Apache Flink

Big Data Technology & Architecture

Jul 2, 2019 · Big Data

Integrating Apache Flink with Apache Pulsar for Scalable Elastic Data Processing

This article explains how Apache Pulsar and Apache Flink can be combined to provide a unified, scalable, and fault‑tolerant data processing platform, covering Pulsar's architecture, its differences from other messaging systems, various integration patterns, and concrete code examples for stream and batch workloads.

Apache FlinkApache PulsarBig Data

0 likes · 13 min read

Integrating Apache Flink with Apache Pulsar for Scalable Elastic Data Processing

Big Data Technology & Architecture

Jun 29, 2019 · Big Data

Apache Flink 1.9 Feature Overview – Beijing Meetup (June 29)

On June 29, the Apache Flink Beijing Meetup presented a comprehensive analysis of Flink 1.9’s major architectural changes, new Table API & SQL capabilities, runtime and core enhancements, and future roadmap, with slides and resources made available for download.

Apache FlinkBig DataFlink 1.9

0 likes · 2 min read

Apache Flink 1.9 Feature Overview – Beijing Meetup (June 29)

Big Data Technology & Architecture

Jun 20, 2019 · Big Data

Comprehensive Guide to Flink SQL: Background, New Features, Programming Model, Operators, Functions, and a Practical NBA Scoring Leader Example

This article provides an in‑depth overview of Flink SQL, covering its origins, the latest 1.7.0 and 1.8.0 enhancements, the underlying programming model, common operators and built‑in functions, and a complete end‑to‑end example that analyzes NBA scoring‑leader data using Flink SQL.

Apache FlinkBig DataData Engineering

0 likes · 27 min read

Comprehensive Guide to Flink SQL: Background, New Features, Programming Model, Operators, Functions, and a Practical NBA Scoring Leader Example

Big Data Technology & Architecture

May 25, 2019 · Big Data

Understanding State TTL and Continuous Cleanup in Apache Flink 1.8.0

This article explains how Apache Flink's State TTL feature works, demonstrates configuring TTL for state size control and automatic cleanup, and details the continuous cleanup mechanisms introduced in Flink 1.8.0 for both heap and RocksDB state backends.

Apache FlinkContinuous CleanupJava

0 likes · 16 min read

Understanding State TTL and Continuous Cleanup in Apache Flink 1.8.0

Big Data Technology & Architecture

May 22, 2019 · Big Data

Key Changes and New Features in Apache Flink 1.8.0 Release

Apache Flink 1.8.0 introduces incremental state cleanup with TTL, updates Hadoop support, deprecates TableEnvironment static methods, adds new Kafka deserialization schema, modifies Maven dependencies, and provides several configuration and Table API enhancements for better stream‑processing performance and compatibility.

Apache FlinkHadoopMaven

0 likes · 7 min read

Key Changes and New Features in Apache Flink 1.8.0 Release

Big Data Technology & Architecture

May 19, 2019 · Big Data

Implementing End-to-End Exactly-Once Semantics in Apache Flink with Apache Kafka Using Two-Phase Commit Sink

This article explains how Apache Flink’s TwoPhaseCommitSinkFunction, introduced in version 1.4, enables end-to-end exactly-once semantics when integrated with Apache Kafka, detailing the checkpoint mechanism and the two-phase commit protocol that ensures reliable data processing.

Apache FlinkApache KafkaBig Data

0 likes · 4 min read

Implementing End-to-End Exactly-Once Semantics in Apache Flink with Apache Kafka Using Two-Phase Commit Sink

Big Data Technology & Architecture

May 15, 2019 · Backend Development

How ByteDance Uses Kafka – Presentation by Gong Yunfei at the 2019 Apache Flink x Apache Kafka Conference

This article presents Gong Yunfei's 2019 talk from the Beijing Apache Flink x Apache Kafka conference, detailing how ByteDance leverages Kafka for its large‑scale streaming and data processing needs.

Apache FlinkBackend DevelopmentByteDance

0 likes · 2 min read

How ByteDance Uses Kafka – Presentation by Gong Yunfei at the 2019 Apache Flink x Apache Kafka Conference

G7 EasyFlow Tech Circle

Apr 23, 2019 · Big Data

How We Scaled Fatigue Event Processing to 45K TPS with Apache Flink

By iteratively redesigning the fatigue‑event detection pipeline and leveraging Apache Flink’s stateful stream processing, the team reduced network overhead, cut resource usage to a third, and achieved a stable 45,000 TPS throughput on six containers with 20 GB memory, while outlining three optimization phases and practical lessons.

Apache FlinkFatigue DetectionIoT

0 likes · 13 min read

How We Scaled Fatigue Event Processing to 45K TPS with Apache Flink

Big Data Technology Architecture

Apr 22, 2019 · Big Data

Comparison of Apache Spark and Apache Flink: Programming Models, Streaming, State Management, and Exactly-Once Semantics

This article compares Apache Spark and Apache Flink, outlining their programming models, streaming mechanisms, state management, time semantics, and exactly‑once guarantees, and highlights the strengths and differences of each framework for batch and real‑time big‑data processing.

Apache FlinkApache SparkExactly-once

0 likes · 8 min read

Comparison of Apache Spark and Apache Flink: Programming Models, Streaming, State Management, and Exactly-Once Semantics

Big Data Technology & Architecture

Mar 29, 2019 · Big Data

Weekly Knowledge Digest: Apache Flink Deep Dives on JOIN LATERAL, TimeInterval, Temporal Table, and State Management

This week's digest shares a personal anecdote and a series of technical deep‑dives into Apache Flink, covering JOIN LATERAL, TimeInterval JOIN, Temporal Table JOIN, state management, and related code examples, while also previewing upcoming work schedules and recommended Flink reference articles.

Apache FlinkBig DataSQL Join

0 likes · 5 min read

Weekly Knowledge Digest: Apache Flink Deep Dives on JOIN LATERAL, TimeInterval, Temporal Table, and State Management

ITPUB

Mar 28, 2019 · Big Data

Why Pravega Matters: Native Stream Storage for Low‑Latency, Exactly‑Once Data Pipelines

Pravega, Dell’s native stream storage project, addresses the challenges of modern low‑latency, exactly‑once stream processing by combining tiered storage, Apache BookKeeper, and seamless Flink integration, offering a unified solution that reduces development, storage, and operational costs compared to traditional message systems like Kafka.

Apache FlinkExactly-onceKafka Comparison

0 likes · 10 min read

Why Pravega Matters: Native Stream Storage for Low‑Latency, Exactly‑Once Data Pipelines

Big Data Technology & Architecture

Mar 27, 2019 · Big Data

Understanding State Management and Scaling in Apache Flink

This article explains how Apache Flink uses state for incremental stream processing, describes the different state backends, details the persistence mechanism, and shows how both OperatorState and KeyedState are redistributed during scaling using partition and key‑group algorithms.

Apache FlinkKeyedStateOperatorState

0 likes · 14 min read

Understanding State Management and Scaling in Apache Flink

Big Data Technology & Architecture

Mar 26, 2019 · Databases

Understanding Temporal Table JOIN in SQL and Apache Flink

This article explains the concept of Temporal Table JOIN, its implementation in SQL Server and Apache Flink, provides DDL/DML examples, compares it with other join types, and discusses improvements to align Flink with ANSI‑SQL standards.

Apache FlinkSQLTemporal Join

0 likes · 16 min read

Understanding Temporal Table JOIN in SQL and Apache Flink

Big Data Technology & Architecture

Mar 25, 2019 · Big Data

Understanding Apache Flink Interval Join: Syntax, Semantics, and Implementation

This article explains how Apache Flink's Interval Join solves time‑bounded join requirements more efficiently than unbounded joins, covering its syntax, semantics, state‑management considerations, and providing a complete Scala example with code and execution results.

Apache FlinkBig DataInterval Join

0 likes · 11 min read

Understanding Apache Flink Interval Join: Syntax, Semantics, and Implementation

Big Data Technology & Architecture

Mar 24, 2019 · Databases

Understanding JOIN LATERAL: From Traditional Databases to Apache Flink

This article explains the special JOIN LATERAL operator, compares it with INNER JOIN and correlated subqueries, shows how SQL Server implements it via CROSS APPLY, and demonstrates its support in Apache Flink using Calcite and user‑defined table functions with concrete code examples.

Apache FlinkLATERALSQL

0 likes · 12 min read

Understanding JOIN LATERAL: From Traditional Databases to Apache Flink

Big Data Technology & Architecture

Mar 22, 2019 · Big Data

Weekly Knowledge Points: Apache Flink Continuous Queries, Kafka Connectors, SQL Overview, JOIN Operator, and Table API

This weekly briefing introduces Apache Flink's continuous query mechanism, demonstrates how to integrate Kafka as a DataStream connector, provides an overview of Flink SQL features, explains the implementation and optimization of dual‑stream JOIN operators, and showcases the Table API with end‑to‑end examples.

Apache FlinkBig DataSQL

0 likes · 3 min read

Weekly Knowledge Points: Apache Flink Continuous Queries, Kafka Connectors, SQL Overview, JOIN Operator, and Table API

Big Data Technology & Architecture

Mar 20, 2019 · Databases

Understanding JOIN Operators: From Traditional Databases to Apache Flink Streaming

This article explains the purpose and types of SQL JOIN operators, demonstrates their syntax and semantics with examples, compares traditional database joins to Apache Flink's streaming two‑stream join implementation, and discusses optimization techniques such as state management, shuffle handling, and join reordering.

Apache FlinkSQLState Management

0 likes · 22 min read

Understanding JOIN Operators: From Traditional Databases to Apache Flink Streaming

Big Data Technology & Architecture

Mar 19, 2019 · Big Data

Comprehensive Overview of SQL and Apache Flink SQL Features with Practical Code Examples

This article provides an in-depth introduction to SQL, its history and ANSI standards, then details Apache Flink's SQL capabilities—including SELECT, WHERE, GROUP BY, UNION, JOIN, window functions, and user-defined functions—accompanied by extensive code examples and a complete end‑to‑end Flink job implementation.

Apache FlinkBig DataSQL

0 likes · 34 min read

Comprehensive Overview of SQL and Apache Flink SQL Features with Practical Code Examples

Big Data Technology & Architecture

Mar 18, 2019 · Big Data

Introduction to Apache Kafka and Its Integration with Apache Flink

This article provides a step‑by‑step guide on installing Apache Kafka, creating topics, producing and consuming messages via command line, and demonstrates how to integrate Kafka with Apache Flink using the Flink‑Kafka connector, custom serialization schemas, and event‑time window processing.

Apache FlinkEvent TimeJava

0 likes · 23 min read

Introduction to Apache Kafka and Its Integration with Apache Flink

Big Data Technology & Architecture

Mar 17, 2019 · Big Data

Understanding Continuous Queries in Apache Flink: From Static Queries to Dynamic Tables and Trigger Simulations

This article explains how Apache Flink implements continuous queries for unbounded stream processing, compares static and continuous query semantics, demonstrates how MySQL triggers can simulate continuous queries in append‑only and update scenarios, and discusses Flink's connector, source, sink, and retraction mechanisms for correct incremental computation.

Apache FlinkBig DataContinuous Query

0 likes · 18 min read

Understanding Continuous Queries in Apache Flink: From Static Queries to Dynamic Tables and Trigger Simulations

Big Data Technology & Architecture

Mar 13, 2019 · Big Data

Understanding Fault Tolerance and Exactly-Once Semantics in Apache Flink

This article explains Apache Flink's fault‑tolerance mechanisms, including checkpointing, barrier alignment, the differences between At‑Least‑Once and Exactly‑Once semantics, configuration options, incremental checkpointing, and the requirements for external sources and sinks to achieve end‑to‑end exactly‑once processing.

Apache FlinkBig DataExactly-once

0 likes · 15 min read

Understanding Fault Tolerance and Exactly-Once Semantics in Apache Flink

Big Data Technology & Architecture

Mar 12, 2019 · Big Data

Understanding Apache Flink’s Core Design: “Batch Is a Special Case of Stream” and Its Architecture

This article explains Apache Flink’s fundamental design principle that treats batch as a special case of stream, compares native streaming with micro‑batching, describes its deployment modes, fault‑tolerance mechanisms, unified data and scheduling layers, and outlines Alibaba’s architectural optimizations for the platform.

Apache Flinkbatch processingnative streaming

0 likes · 15 min read

Understanding Apache Flink’s Core Design: “Batch Is a Special Case of Stream” and Its Architecture

Big Data Technology & Architecture

Mar 1, 2019 · Big Data

Understanding Watermarks in Apache Flink for Handling Out-of-Order Events

This article explains how Apache Flink uses Watermarks to manage event‑time windows, describes the three time semantics, details periodic and punctuated Watermark generation methods with their Java interfaces, and shows practical DDL examples for handling late and out‑of‑order data in stream processing.

Apache FlinkBig DataEventTime

0 likes · 11 min read

Understanding Watermarks in Apache Flink for Handling Out-of-Order Events

Big Data Technology & Architecture

Jan 3, 2019 · Big Data

Deploying Apache Flink on YARN and Running Flink Jobs

This tutorial explains how to deploy Apache Flink on a Hadoop YARN cluster, covering both YARN session mode and direct job submission, and demonstrates running the built‑in WordCount example with command‑line options for input, output, and resource configuration.

Apache FlinkBig DataFlink Deployment

0 likes · 8 min read

Deploying Apache Flink on YARN and Running Flink Jobs

Alibaba Cloud Developer

Jan 3, 2019 · Big Data

How Apache Flink Powers Real‑Time Big Data at Alibaba and Beyond

The 2018 Flink Forward China conference in Beijing showcased Apache Flink’s evolution, Alibaba’s massive contributions—including the Blink fork, real‑time BI, online learning and city‑level analytics—and highlighted how industry leaders like Alibaba, Didi and others leverage Flink for scalable, low‑latency big‑data processing across diverse use cases.

Apache FlinkBatch-Stream FusionStreaming

0 likes · 19 min read

How Apache Flink Powers Real‑Time Big Data at Alibaba and Beyond

Alibaba Cloud Developer

Nov 29, 2018 · Big Data

Why Apache Flink Became the Fastest‑Growing Big Data Engine in 2018

This article introduces Apache Flink’s rapid rise as the leading open‑source big data engine, explains its role in batch, stream, and interactive analytics, showcases real‑world use cases from Alibaba, Didi, and ByteDance, and outlines how Flink powers both big data and AI workloads.

AIApache FlinkBig Data

0 likes · 8 min read

Why Apache Flink Became the Fastest‑Growing Big Data Engine in 2018

Alibaba Cloud Developer

Nov 22, 2018 · Big Data

How Alibaba’s Blink Testing Platform Guarantees Real‑Time Big Data Reliability

This article explains how Alibaba built a comprehensive Blink testing platform—including code‑quality checks, functional, performance, stability, and pre‑release testing—to ensure the reliability and scalability of its real‑time big‑data processing engine during massive workloads like Double 11.

Apache FlinkBig DataTesting framework

0 likes · 13 min read

How Alibaba’s Blink Testing Platform Guarantees Real‑Time Big Data Reliability

Qunar Tech Salon

Oct 25, 2018 · Big Data

Why Alibaba Chose Apache Flink: Architecture, Scale, and Future Directions

This article explains how Alibaba adopted Apache Flink as a unified, low‑latency, high‑throughput big‑data engine, detailing its stream‑first design, state management, checkpointing, massive production deployment, community contributions, and upcoming plans for a unified API, SQL layer, broader language support, and AI integration.

AlibabaApache FlinkBig Data

0 likes · 13 min read

Why Alibaba Chose Apache Flink: Architecture, Scale, and Future Directions

Alibaba Cloud Developer

Oct 15, 2018 · Big Data

Why Alibaba Chose Apache Flink: A Deep Dive into Its Big Data Journey

This article explains how Alibaba adopted Apache Flink as a unified, low‑latency, high‑throughput big data engine, covering its origins, technical advantages over Spark, large‑scale deployment, state management, checkpointing, API unification, and future directions in streaming and batch processing.

AlibabaApache FlinkUnified Engine

0 likes · 14 min read

Why Alibaba Chose Apache Flink: A Deep Dive into Its Big Data Journey

JD Tech Talk

Aug 2, 2018 · Big Data

Real-Time Order Statistics with Apache Flink in a Data Aggregation Platform

This article explains how the data aggregation platform adopts Apache Flink for high‑throughput, low‑latency stream processing, covering the complete workflow from data source integration, transformation operations, windowing and time concepts, to a concrete order‑count example with custom aggregation logic.

Apache FlinkEvent TimeFlink

0 likes · 10 min read

Real-Time Order Statistics with Apache Flink in a Data Aggregation Platform

Meituan Technology Team

Nov 16, 2017 · Big Data

Performance Comparison of Apache Flink and Apache Storm for Real-Time Stream Processing

The study benchmarks Apache Flink against Apache Storm on a shared cluster, showing Flink delivering three‑to‑five times higher throughput and roughly half the latency across simple, sleep‑induced, and windowed workloads, with modest throughput loss for exactly‑once semantics, leading to a recommendation of Flink for high‑performance, stateful real‑time stream processing.

Apache FlinkApache StormExactly-once

0 likes · 19 min read

Performance Comparison of Apache Flink and Apache Storm for Real-Time Stream Processing

Baixing.com Technical Team

Sep 4, 2017 · Big Data

How Flink SQL Simplifies Real-Time Data Cleaning Compared to Storm

This article introduces Flink’s background, architecture, and ecosystem, then demonstrates a step‑by‑step tutorial on using Flink SQL to clean and transform streaming data from Kafka, highlighting its advantages over Storm for real‑time ETL.

Apache FlinkData EngineeringFlink

0 likes · 12 min read

How Flink SQL Simplifies Real-Time Data Cleaning Compared to Storm

Alibaba Cloud Developer

Jun 8, 2017 · Big Data

Flink Forward 2017: Stream Processing Insights from Alibaba, Uber & Netflix

The article recounts the 2017 Flink Forward conference in San Francisco, highlighting key sessions from DataArtisans, Uber, Netflix and Alibaba, and discusses real‑time stream processing use cases, large‑scale deployments, runtime and TableAPI/SQL improvements, and the growing adoption of Flink in the industry.

Apache FlinkBig DataFlink

0 likes · 16 min read

Flink Forward 2017: Stream Processing Insights from Alibaba, Uber & Netflix

Suning Technology

May 18, 2017 · Big Data

Why Apache Flink Beats Spark and Storm in Stream Processing

This article examines Apache Flink's stream‑processing architecture, compares its native streaming model, fault‑tolerance, performance and SQL capabilities with Spark and Storm, and concludes that Flink offers a more powerful and efficient solution despite some maturity gaps.

Apache FlinkSparkStorm

0 likes · 12 min read

Why Apache Flink Beats Spark and Storm in Stream Processing