Understanding Back Pressure in Flink and Its Implementation
The article explains what back pressure is in Flink streaming jobs, why it occurs when data generation outpaces downstream consumption, how Flink monitors it via stack‑trace sampling, configurable parameters, Web UI visualization, and compares the approach with Spark Streaming's back pressure mechanism.
Back pressure occurs when the rate of data generation exceeds the consumption rate of downstream operators, causing warnings such as High level in Flink jobs.
Typical causes include GC pauses, source peak rates, and insufficient downstream processing capacity, which can lead to resource exhaustion or data loss if not handled.
In a simple Source→Sink pipeline, normal operation processes 5 million elements per second; when the source spikes to double speed, the sink cannot keep up, resulting in back pressure.
To mitigate back pressure, one can drop elements (often unacceptable) or buffer messages persistently and signal the sender to slow down, ensuring no data loss.
Flink implements back pressure monitoring by periodically sampling stack traces of running tasks using Thread.getStackTrace(). If samples show a task thread stuck in an internal method, back pressure is detected.
By default, JobManager triggers 100 stack trace samples per task every 50 ms; the ratio of blocked traces indicates severity (OK 0‑0.10, LOW 0.10‑0.5, HIGH 0.5‑1). Sampling data is refreshed every 60 seconds to limit overhead.
Configuration parameters allow tuning of the sampling interval ( web.backpressure.refresh-interval), number of samples ( web.backpressure.num-samples), and delay between samples ( web.backpressure.delay-between-samples).
The Flink Web UI displays a Back Pressure page showing sampling status, back pressure ratios, and visual indicators of normal, low, or high pressure.
Compared with Spark Streaming, which introduced back pressure in version 1.5 using automatic rate adjustment based on events like onBatchCompleted, processingDelay, schedulingDelay, and record counts, Flink’s approach relies on stack‑trace sampling to assess blockage ratios.
Overall, Flink’s back pressure mechanism provides a more granular detection method, while Spark’s offers a simpler rate‑control strategy.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
