Understanding Flink’s Data Flow: Buffer Pools, Network Transfer, and Credit‑Based Flow Control
This article explains Flink’s internal data abstraction and transfer mechanisms, detailing how data moves between operators via network buffers, the role of ByteBuffer and NetworkBufferPool, the serialization process, Netty integration, and credit‑based flow control to handle backpressure.
1. Data Flow – Flink’s Data Abstraction and Exchange
Flink defines data using internal abstractions that are passed between operators via network buffers. The first constructor’s checkBufferAndGetAddress() obtains the direct buffer’s memory address, enabling off‑heap memory operations.
1.2 ByteBuffer and NetworkBufferPool
NetworkBufferPool allocates a pool of off‑heap ByteBuffer segments. Example code shows the abstract reference‑counted byte buffer and the constructor of NetworkBufferPool which creates an ArrayBlockingQueue of memory segments and logs allocation details.
Methods such as redistributeBuffers() and setNumBuffers() manage buffer distribution, handling backpressure by reallocating memory segments among buffer pools.
2. Data Transfer Process
Data is emitted from operators through RecordWriter, which selects target channels via a channel selector and serializes records. The serialization loop writes full buffers, requests new buffers, and flushes when necessary.
Netty is used for network transmission. ResultPartition.flushAll() triggers subpartition flushes, and PartitionRequestQueue.notifyReaderNonEmpty() schedules a Netty event to push data.
Netty’s ChannelHandlerContext.fireUserEventTriggered propagates events, optionally executing in the event loop or via a one‑time task.
The actual write occurs in writeAndFlushNextMessageIfPossible, which retrieves buffers from readers, handles end‑of‑partition events, and writes BufferResponse messages to the channel.
On the receiving side, CreditBasedPartitionRequestClientHandler.decodeMsg processes BufferResponse messages, routing them to the appropriate RemoteInputChannel. StreamElementSerializer.deserialize reconstructs stream elements such as records, watermarks, and latency markers.
3. Credit‑Based Flow Control
Flink uses a credit‑based mechanism to manage backpressure. Downstream nodes send credit messages indicating available buffer capacity; upstream nodes send data respecting this credit, decrementing the credit balance as buffers are consumed.
Illustrations compare connection‑oriented flow control (allowing each upstream node to buffer data) with end‑to‑end flow control (where only the downstream node buffers), highlighting the advantages of credit‑based design.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
