Big Data 16 min read

Understanding Network Flow Control and Flink's Backpressure Mechanisms

This article explains the concepts and background of network flow control, compares static rate limiting with dynamic feedback backpressure, describes TCP's sliding‑window mechanism, and details how Flink implements both TCP‑based and credit‑based backpressure to handle mismatched upstream‑downstream speeds in streaming applications.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding Network Flow Control and Flink's Backpressure Mechanisms

Network flow control is essential when the producer’s sending rate exceeds the consumer’s processing capacity, as illustrated by a simple diagram where a 2 MB/s producer feeds a 1 MB/s consumer, leading to buffer overflow or data loss if unchecked.

Traditional static rate limiting (e.g., a RateLimiter on the producer) can align the sending speed with the consumer’s capacity, but it suffers from two major drawbacks: the consumer’s maximum sustainable rate is unknown in advance, and the consumer’s capacity often fluctuates dynamically.

Dynamic feedback, or automatic backpressure, addresses these issues by allowing the consumer to continuously inform the producer of the acceptable rate. Negative feedback reduces the producer’s speed when the consumer is slower, while positive feedback increases it when the consumer can handle more data.

Flink’s network transmission architecture builds on Netty and a three‑level buffer hierarchy: a Send Buffer on the producer side, a Netty ChannelOutbound Buffer, and a Receive Buffer on the consumer side. TCP’s built‑in flow control (sliding window) is used by Flink versions before 1.5 to propagate feedback.

The TCP sliding‑window mechanism assigns a sequence number and an ACK number to each packet and uses the Window Size field to tell the sender how much more data the receiver can accept. An example shows a sender transmitting three packets while the receiver can only process one, causing the receiver to send ACK=4 and a reduced window, which throttles the sender’s rate.

Before Flink 1.5, backpressure relied on TCP flow control combined with bounded buffers. The article walks through a WindowWordCount example, showing how the StreamGraph is compiled into a JobGraph, then into an ExecutionGraph, and how data flows through InputGate, ResultPartition, and ResultSubPartition.

TCP‑based backpressure has two main drawbacks: a single congested task can block the shared socket used by many tasks on the same TaskManager, and the reliance on low‑level TCP makes the feedback path long, increasing latency.

Starting with Flink 1.5, a credit‑based backpressure mechanism was introduced, mimicking TCP’s window at the application layer. Credits represent the number of buffers the downstream can provide; the upstream sends a backlog size, and the downstream returns a credit indicating how many more records may be sent.

An example demonstrates a producer sending at speed 2 while the consumer processes at speed 1, causing the ResultSubPartition to accumulate a backlog of two messages. The downstream allocates buffers from its Local and Network BufferPools; when those pools are exhausted, it returns Credit = 0, effectively throttling the upstream without involving the TCP layer.

In summary, network flow control prevents downstream overload, can be implemented via static rate limiting or dynamic backpressure, Flink < 1.5 used TCP‑based backpressure, and Flink ≥ 1.5 uses a credit‑based mechanism that reduces latency and avoids socket blockage, though static limiting may still be needed for sinks that cannot propagate backpressure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

FlinkTCPbackpressureCredit-basedNetwork Flow Control
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.