Big Data 6 min read

Why Kafka’s __consumer_offsets Topic Can Fill Your Disk and How to Fix It

The article explains Kafka’s default consumer offset storage mechanism, why the __consumer_offsets system topic can consume massive disk space due to frequent synchronous commits and misconfigured cleanup, and outlines practical steps to reduce offset data and enable proper log compaction.

dbaplus Community
dbaplus Community
dbaplus Community
Why Kafka’s __consumer_offsets Topic Can Fill Your Disk and How to Fix It

Background

Kafka is a high‑throughput, distributed publish‑subscribe system that stores messages in partitioned logs. Since version 0.9 the consumer offset information is saved in an internal system topic named __consumer_offsets instead of ZooKeeper.

Problem: Disk Usage Explosion

A real‑world incident showed a Kafka broker (pc‑xxx01) with over 80% filesystem usage; the __consumer_offsets topic’s partition 24 alone occupied 952 GB, accounting for 41% of total disk consumption.

Understanding __consumer_offsets

Each message in this topic stores

[consumer group, topic name, partition]::[offset metadata, commit time, expiration time]

. The key is the consumer group, so records for the same group are hashed to the same partition.

Commit Frequency and Retention

Consumers send an OffsetCommitRequest at intervals defined by auto.commit.interval.ms (default 60 s). Synchronous commit code that commits after every message can generate a huge number of offset records.

The __consumer_offsets topic uses a compact cleanup policy with a 24‑hour retention window, but the broker’s log.cleaner.enable flag is often left false, preventing any compaction and causing the topic to retain all historical data.

Root Causes of Excessive Data

The cleanup policy for __consumer_offsets is misconfigured, so expired data is never removed.

Some applications use a naïve synchronous commit strategy, generating a commit record for every consumed message.

Remediation Steps

Refactor application code to reduce commit frequency; after optimization the daily increase of commit data dropped from 37 GB to 1.5 GB.

Enable log compaction by setting log.cleaner.enable=true and restart the broker to trigger cleanup.

After applying these measures, the size of __consumer_offsets shrank from roughly 900 GB to 2 GB.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Operationsbig-datadisk usageConsumer OffsetOffset Management
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.