Big Data 9 min read

How Kafka Achieves High Performance: Batch Sending, Compression, Sequential Disk I/O, PageCache, Zero‑Copy, and mmap

This article explains the key techniques behind Kafka's high‑throughput performance, including batch sending, message compression, sequential disk reads/writes, efficient use of PageCache, zero‑copy transfers, and memory‑mapped index files, with code examples illustrating each mechanism.

Architect

May 2, 2024

How Kafka Achieves High Performance: Batch Sending, Compression, Sequential Disk I/O, PageCache, Zero‑Copy, and mmap

Kafka is a high‑performance messaging queue capable of handling millions of messages per second. This article explores the technical principles that enable its performance.

1 Batch Sending

Kafka processes messages in batches. The producer caches messages and sends them only when the batch size is reached.

private Future<RecordMetadata> doSend(ProducerRecord<K, V> record, Callback callback) {
    // ... code omitted ...
    RecordAccumulator.RecordAppendResult result = accumulator.append(...);
    if (result.batchIsFull || result.newBatchCreated) {
        log.trace("Waking up the sender since topic {} partition {} is either full or getting a new batch", record.topic(), partition);
        this.sender.wakeup();
    }
    return result.future;
}

The producer stores messages in a batch; once the batch is full, it wakes the sender to transmit the data.

2 Message Compression

To overcome network bandwidth limits, Kafka can compress messages before sending. Setting compression.type enables compression (e.g., gzip, snappy, lz4, zstd).

public static void main(String[] args) {
    Properties props = new Properties();
    props.put("bootstrap.servers", "localhost:9092");
    props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer");
    // Enable compression
    props.put("compression.type", "gzip");
    Producer<String, String> producer = new KafkaProducer<>(props);
    // ... send record ...
    producer.close();
}

Compression occurs when the producer assembles a batch, before the batch is transmitted.

public RecordAppendResult append(TopicPartition tp, long timestamp, byte[] key, byte[] value, Header[] headers, Callback callback, long maxTimeToBlock) throws InterruptedException {
    // ... allocate buffer ...
    if (appendResult != null) {
        return appendResult;
    }
    // Batch is full, compress here
    MemoryRecordsBuilder recordsBuilder = recordsBuilder(buffer, maxUsableMagic);
    ProducerBatch batch = new ProducerBatch(tp, recordsBuilder, time.milliseconds());
    // ... add to queue ...
    return new RecordAppendResult(future, dq.size() > 1 || batch.isFull(), true);
}

Kafka supports gzip, snappy, lz4, and since version 2.1.0, Zstandard.

3 Sequential Disk Read/Write

Kafka writes messages sequentially to files per partition, minimizing seek time and maximizing throughput on both SSDs and HDDs.

4 PageCache

Linux PageCache caches file data in memory. Kafka relies on PageCache so that when producer and consumer rates are balanced, most data can be transferred without hitting the disk.

5 Zero‑Copy

When a broker sends data to a consumer, it uses zero‑copy (e.g., FileChannel.transferTo()) to move bytes directly from PageCache to the socket buffer, avoiding user‑space copies and CPU overhead.

6 mmap

Kafka memory‑maps index files (.index) using mmap, allowing fast index lookups without additional disk reads.

7 Summary

The article presented the core techniques—batching, compression, sequential I/O, PageCache utilization, zero‑copy, and mmap—that together give Kafka its high performance, providing useful references for developers and architects.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

kafka mmap Zero‑copy PageCache high performance compression batch-processing

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.