Big Data 10 min read

Understanding Flink’s Data Flow: Buffer Pools, Network Transfer, and Credit‑Based Flow Control

This article explains Flink’s internal data abstraction and transfer mechanisms, detailing how data moves between operators via network buffers, the role of ByteBuffer and NetworkBufferPool, the serialization process, Netty integration, and credit‑based flow control to handle backpressure.

Big Data Technology & Architecture

Apr 4, 2023

Understanding Flink’s Data Flow: Buffer Pools, Network Transfer, and Credit‑Based Flow Control

1. Data Flow – Flink’s Data Abstraction and Exchange

Flink defines data using internal abstractions that are passed between operators via network buffers. The first constructor’s checkBufferAndGetAddress() obtains the direct buffer’s memory address, enabling off‑heap memory operations.

1.2 ByteBuffer and NetworkBufferPool

NetworkBufferPool allocates a pool of off‑heap ByteBuffer segments. Example code shows the abstract reference‑counted byte buffer and the constructor of NetworkBufferPool which creates an ArrayBlockingQueue of memory segments and logs allocation details.

Methods such as redistributeBuffers() and setNumBuffers() manage buffer distribution, handling backpressure by reallocating memory segments among buffer pools.

2. Data Transfer Process

Data is emitted from operators through RecordWriter, which selects target channels via a channel selector and serializes records. The serialization loop writes full buffers, requests new buffers, and flushes when necessary.

Netty is used for network transmission. ResultPartition.flushAll() triggers subpartition flushes, and PartitionRequestQueue.notifyReaderNonEmpty() schedules a Netty event to push data.

Netty’s ChannelHandlerContext.fireUserEventTriggered propagates events, optionally executing in the event loop or via a one‑time task.

The actual write occurs in writeAndFlushNextMessageIfPossible, which retrieves buffers from readers, handles end‑of‑partition events, and writes BufferResponse messages to the channel.

On the receiving side, CreditBasedPartitionRequestClientHandler.decodeMsg processes BufferResponse messages, routing them to the appropriate RemoteInputChannel. StreamElementSerializer.deserialize reconstructs stream elements such as records, watermarks, and latency markers.

3. Credit‑Based Flow Control

Flink uses a credit‑based mechanism to manage backpressure. Downstream nodes send credit messages indicating available buffer capacity; upstream nodes send data respecting this credit, decrementing the credit balance as buffers are consumed.

Illustrations compare connection‑oriented flow control (allowing each upstream node to buffer data) with end‑to‑end flow control (where only the downstream node buffers), highlighting the advantages of credit‑based design.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink Data Flow backpressure Credit-based Flow Control Network Buffers

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.