Big Data 20 min read

Apache Flink: Unified Stream and Batch Processing Architecture and Core Concepts

This article provides a comprehensive overview of Apache Flink, explaining how it unifies stream and batch processing on a single runtime, detailing its key features, APIs, libraries, architectural components, fault‑tolerance mechanisms, scheduling, iterative processing, and back‑pressure monitoring.

Architecture Digest
Architecture Digest
Architecture Digest
Apache Flink: Unified Stream and Batch Processing Architecture and Core Concepts

Apache Flink is an open‑source platform for distributed stream and batch data processing that uses a single runtime to support both paradigms, treating batch jobs as bounded streams.

Basic Features

Flink offers high‑throughput, low‑latency streaming, event‑time windows, exactly‑once stateful computation, flexible windowing (time, count, session, data‑driven), back‑pressure handling, lightweight snapshot‑based fault tolerance, iterative computation, and automatic program optimizations.

API Support

For streaming applications Flink provides the DataStream API, while batch jobs use the DataSet API (Java/Scala).

Library Support

Additional libraries include FlinkML for machine learning, Gelly for graph processing, Table API for relational queries, and CEP for complex event processing.

Integration

Flink integrates with YARN, HDFS, Kafka, HBase, Hadoop, Tachyon, Elasticsearch, RabbitMQ, Storm, S3, and XtreemFS.

Core Concepts

A Flink program consists of Streams (intermediate data) and Transformations (operators) that together form a Streaming Dataflow, represented as a DAG of source, transformation, and sink operators.

Parallelism is achieved by partitioning streams and subtasks; operators can run in one‑to‑one or redistribution modes, and multiple subtasks are chained into an Operator Chain that executes on a TaskManager thread.

Flink supports time‑based and data‑driven windows, handling Event Time, Ingestion Time, and Processing Time, with the trade‑off that Event‑Time processing may increase latency due to out‑of‑order handling.

Architecture

Flink follows a master‑slave model with a JobManager (coordinator) and one or more TaskManagers (workers). The client submits a JobGraph, which the JobManager translates into an ExecutionGraph for scheduling.

Key components include the Deployment layer (local, standalone, YARN, cloud), Runtime layer (core execution engine), API layer (DataStream/DataSet), and Libraries layer (ML, graph, CEP, Table).

Internal Mechanisms

Fault tolerance is achieved via lightweight distributed snapshots (checkpoints) using barriers that travel with the data stream; snapshots capture both stream state and operator state.

Scheduling maps the JobGraph to an ExecutionGraph, allocating tasks to TaskManager slots based on parallelism settings.

Iterative processing is supported through Iterate and Delta‑Iterate operators, enabling machine‑learning and graph algorithms.

Back‑pressure monitoring samples task stack traces to compute a ratio indicating the level of pressure (OK, LOW, HIGH) and provides configurable parameters to tune the monitoring behavior.

stream processingbatch processingApache Flinkfault tolerancedistributed computingbackpressurecheckpointing
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.