Big Data 20 min read

Deep Dive into Kafka Broker Network Architecture and Request Processing Flow

This article thoroughly analyzes Kafka broker's high‑throughput network architecture, tracing its evolution from simple sequential handling to a multi‑selector Reactor model, detailing Acceptor and Processor thread implementations, request‑handling pipelines, and practical tuning parameters for optimal performance.

Wukong Talks Architecture
Wukong Talks Architecture
Wukong Talks Architecture
Deep Dive into Kafka Broker Network Architecture and Request Processing Flow

Kafka brokers achieve million‑level throughput through a carefully designed network architecture that evolves from a basic sequential request loop to an advanced multi‑selector Reactor pattern, leveraging Java NIO for non‑blocking I/O.

The article begins with a simple while‑loop model that accepts connections and processes them sequentially, highlighting its two major drawbacks: request blocking and poor throughput.

It then introduces an independent‑thread asynchronous model (connection‑per‑thread), which improves throughput but incurs high thread‑creation overhead, making it unsuitable for ultra‑high concurrency.

To overcome these limits, the design adopts an event‑driven approach using multiplexing (Selector) and the Reactor pattern, where an Acceptor thread registers OP_ACCEPT events and dispatches connections to a pool of Processor threads.

Acceptor Thread implementation (excerpt): /** * Thread that accepts and configures new connections. There is one of these per endpoint. */ private[kafka] class Acceptor(val endPoint: EndPoint, val sendBufferSize: Int, val recvBufferSize: Int, brokerId: Int, connectionQuotas: ConnectionQuotas, metricPrefix: String) extends AbstractServerThread(connectionQuotas) with KafkaMetricsGroup { private val nioSelector = NSelector.open() val serverChannel = openServerSocket(endPoint.host, endPoint.port) private val processors = new ArrayBuffer[Processor]() def run(): Unit = { serverChannel.register(nioSelector, SelectionKey.OP_ACCEPT) startupComplete() var currentProcessorIndex = 0 while (isRunning) { val ready = nioSelector.select(500) if (ready > 0) { // accept connections and assign to processors[currentProcessorIndex] currentProcessorIndex = (currentProcessorIndex + 1) % processors.size } } } }

Processor Thread implementation (excerpt): override def run(): Unit = { startupComplete() try { while (isRunning) { configureNewConnections() processNewResponses() poll() processCompletedReceives() } } finally { /* cleanup */ } } private val newConnections = new ArrayBlockingQueue[SocketChannel](connectionQueueSize) private val inflightResponses = mutable.Map[String, RequestChannel.Response]() private val responseQueue = new LinkedBlockingDeque[RequestChannel.Response]()

The core request‑handling flow is as follows:

Clients send requests to the Acceptor thread.

The Acceptor creates a NIO Selector, registers OP_ACCEPT, and spawns a configurable number of Processor threads (controlled by num.network.threads ).

Accepted connections are placed into each Processor’s newConnections queue.

Processors poll for OP_READ/OP_WRITE events, assemble Request objects, and enqueue them in the RequestChannel request queue.

KafkaRequestHandler threads (size governed by num.io.threads ) dequeue requests, invoke KafkaApis.handle , and perform the actual business logic, persisting data to disk.

Responses are placed into the Processor’s response queue and written back to clients.

KafkaRequestHandler implementation (excerpt): class KafkaRequestHandler(id: Int, brokerId: Int, aggregateIdleMeter: Meter, totalHandlerThreads: AtomicInteger, requestChannel: RequestChannel, apis: KafkaApis, time: Time) extends Runnable with Logging { def run(): Unit = { while (!stopped) { val req = requestChannel.receiveRequest(300) req match { case RequestChannel.ShutdownRequest => /* shutdown */ case request: RequestChannel.Request => try { request.requestDequeueTimeNanos = time.nanoseconds apis.handle(request) } finally { request.releaseBuffer() } case null => // continue } } shutdownComplete.countDown() } }

Finally, the article provides practical tuning recommendations for broker performance, such as setting num.network.threads to CPU cores × 2 , num.io.threads to disk count × 2 , using lz4 compression, and keeping queue sizes high (e.g., queued.max.requests ≥ 500) to avoid bottlenecks.

In summary, mastering Kafka’s Reactor‑based network stack, understanding the roles of Acceptor and Processor threads, and applying the suggested configuration tweaks are essential for achieving high‑throughput, low‑latency Kafka deployments.

network architectureKafkaPerformance TuningBrokerReactor PatternJava NIO
Wukong Talks Architecture
Written by

Wukong Talks Architecture

Explaining distributed systems and architecture through stories. Author of the "JVM Performance Tuning in Practice" column, open-source author of "Spring Cloud in Practice PassJava", and independently developed a PMP practice quiz mini-program.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.