Understanding Back Pressure in Flink and Its Implementation

The article explains what back pressure is in Flink streaming jobs, why it occurs when data generation outpaces downstream consumption, how Flink monitors it via stack‑trace sampling, configurable parameters, Web UI visualization, and compares the approach with Spark Streaming's back pressure mechanism.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding Back Pressure in Flink and Its Implementation

Back pressure occurs when the rate of data generation exceeds the consumption rate of downstream operators, causing warnings such as High level in Flink jobs.

Typical causes include GC pauses, source peak rates, and insufficient downstream processing capacity, which can lead to resource exhaustion or data loss if not handled.

In a simple Source→Sink pipeline, normal operation processes 5 million elements per second; when the source spikes to double speed, the sink cannot keep up, resulting in back pressure.

To mitigate back pressure, one can drop elements (often unacceptable) or buffer messages persistently and signal the sender to slow down, ensuring no data loss.

Flink implements back pressure monitoring by periodically sampling stack traces of running tasks using Thread.getStackTrace(). If samples show a task thread stuck in an internal method, back pressure is detected.

By default, JobManager triggers 100 stack trace samples per task every 50 ms; the ratio of blocked traces indicates severity (OK 0‑0.10, LOW 0.10‑0.5, HIGH 0.5‑1). Sampling data is refreshed every 60 seconds to limit overhead.

Configuration parameters allow tuning of the sampling interval ( web.backpressure.refresh-interval), number of samples ( web.backpressure.num-samples), and delay between samples ( web.backpressure.delay-between-samples).

The Flink Web UI displays a Back Pressure page showing sampling status, back pressure ratios, and visual indicators of normal, low, or high pressure.

Compared with Spark Streaming, which introduced back pressure in version 1.5 using automatic rate adjustment based on events like onBatchCompleted, processingDelay, schedulingDelay, and record counts, Flink’s approach relies on stack‑trace sampling to assess blockage ratios.

Overall, Flink’s back pressure mechanism provides a more granular detection method, while Spark’s offers a simpler rate‑control strategy.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

FlinkSparkdata pipelines
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.