Unlocking Kafka’s Million‑Message‑Per‑Second Performance: Key Techniques Explained
This article explains how Kafka attains extremely high throughput by using batch sending, message compression, sequential disk I/O, PageCache, zero‑copy transfers, and memory‑mapped index files, detailing the underlying code paths and trade‑offs that enable millions of messages per second.
Hello, I'm Su San.
Kafka is a high‑performance message queue capable of handling tens of millions of messages per second; this article explores the technical principles behind its high performance.
1. Batch Sending
Kafka processes sending and receiving of messages in batches. The producer’s code shows that messages are cached and only sent when the batch size reaches the configured limit.
private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
TopicPartition tp = null;
try {
// ... omitted earlier code
Callback interceptCallback = new InterceptorCallback<>(callback, this.interceptors, tp);
// Append message to the current batch
RecordAccumulator.RecordAppendResult result = accumulator.append(tp, timestamp, serializedKey,
serializedValue, headers, interceptCallback, remainingWaitMs);
// If batch is full or a new batch is created, wake up the sender
if (result.batchIsFull || result.newBatchCreated) {
log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
this.sender.wakeup();
}
return result.future;
} catch (/* omitted catch code */) { }
}The producer does not send each message immediately; it buffers them and sends the batch once the configured size is reached. All messages in a batch belong to the same topic and partition.
When the broker receives a batch, it writes the whole batch to disk without parsing individual messages, and replicates the batch to other replicas. Consumers also fetch messages in batches, then split them into individual records for processing. Batch processing reduces client‑broker interactions and boosts broker throughput.
2. Message Compression
If message payloads are large, network bandwidth becomes a bottleneck. Kafka mitigates this by compressing messages. Setting the compression.type property enables compression, for example:
public static void main(String[] args) {
Properties props = new Properties();
props.put("bootstrap.servers", "localhost:9092");
props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
// Enable compression
props.put("compression.type", "gzip");
Producer<String, String> producer = new KafkaProducer<>(props);
ProducerRecord<String, String> record = new ProducerRecord<>("my_topic", "key1", "value1");
producer.send(record, new Callback() {
@Override
public void onCompletion(RecordMetadata metadata, Exception exception) {
if (exception != null) {
logger.error("sending message error: ", e);
} else {
logger.info("sending message successful, Offset: ", metadata.offset());
}
}
});
producer.close();
}When compression.type is set to none, compression is disabled. Compression occurs just before a buffered batch is sent. The underlying append method eventually creates a MemoryRecordsBuilder that wraps the output stream with the selected compressor:
public RecordAppendResult append(TopicPartition tp,
long timestamp,
byte[] key,
byte[] value,
Header[] headers,
Callback callback,
long maxTimeToBlock) throws InterruptedException {
// ... omitted code
buffer = free.allocate(size, maxTimeToBlock);
synchronized (dq) {
RecordAppendResult appendResult = tryAppend(timestamp, key, value, headers, callback, dq);
if (appendResult != null) {
return appendResult;
}
// Batch is full, perform compression
MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
FutureRecordMetadata future = Utils.notNull(batch.tryAppend(timestamp, key, value, headers, callback, time.milliseconds()));
dq.addLast(batch);
incomplete.add(batch);
buffer = null;
return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true);
}
} finally {
if (buffer != null)
free.deallocate(buffer);
appendsInProgress.decrementAndGet();
}The MemoryRecordsBuilder constructor selects the compressor (gzip, snappy, lz4, or Zstandard from version 2.1.0) based on the configuration:
public MemoryRecordsBuilder(ByteBufferOutputStream bufferStream,
byte magic,
CompressionType compressionType,
TimestampType timestampType,
long baseOffset,
long logAppendTime,
long producerId,
short producerEpoch,
int baseSequence,
boolean isTransactional,
boolean isControlBatch,
int partitionLeaderEpoch,
int writeLimit) {
// ... omitted code
this.appendStream = new DataOutputStream(compressionType.wrapForOutput(this.bufferStream, magic));
}On the broker side, headers are decompressed for validation, but the message body remains compressed until the consumer decompresses it. Because compression and decompression consume CPU, enabling compression requires consideration of producer and consumer CPU resources.
3. Sequential Disk I/O
Sequential reads and writes eliminate seek time; a single seek allows continuous data transfer. On SSDs, sequential I/O is several times faster than random I/O, and on HDDs the difference is even larger.
Kafka’s broker creates a separate file for each partition and appends data sequentially. When a file fills, a new file is created, reducing seek overhead and improving throughput.
4. PageCache
In Linux, all file I/O passes through PageCache, a memory cache of disk pages. Applications write to PageCache, and the OS periodically flushes it to disk.
When reading, the system first checks PageCache; if the data is present, it is returned directly, otherwise the disk is accessed and the data is cached.
Kafka leverages PageCache so that when producer and consumer rates are balanced, messages can be transferred without ever being flushed to disk.
5. Zero‑Copy
When a broker sends a batch to a consumer, the data must move from PageCache to the socket buffer. Kafka uses zero‑copy to transfer data directly from PageCache to the socket, avoiding copying into user‑space memory and reducing CPU load.
In Java, zero‑copy is achieved with FileChannel.transferTo(), which internally calls the Linux sendfile system call.
6. Memory‑Mapped Index Files (mmap)
Kafka stores logs in .log files and indexes in .index files. To speed up index reads, the broker memory‑maps the index files, allowing the process to read index entries directly from memory without disk I/O.
7. Summary
This article introduced the key techniques Kafka employs to achieve high performance: batch processing, message compression, sequential disk writes, PageCache utilization, zero‑copy transfers, and memory‑mapped index files. Understanding these mechanisms can help developers design and tune high‑throughput systems.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
