Big Data 5 min read

How to Diagnose and Fix Kafka Message Backlog in High‑Concurrency Environments

In high‑concurrency systems, Kafka message backlog occurs when producers outpace consumers, leading to unprocessed messages that threaten stability and real‑time performance, and this article explains the root causes and provides practical producer‑side and consumer‑side optimization techniques to resolve the issue.

mikechen

Aug 6, 2025

How to Diagnose and Fix Kafka Message Backlog in High‑Concurrency Environments

Kafka Message Backlog in High‑Concurrency Systems

When the rate at which producers publish messages exceeds the rate at which consumers fetch and process them, the number of unconsumed records in a Kafka topic grows continuously. Persistent backlog degrades system stability and real‑time guarantees.

Typical Causes

Producer overload : Sudden traffic spikes (e.g., flash‑sale "秒杀") generate a burst of produce requests that temporarily exceed the cluster’s throughput.

Insufficient consumer capacity : Complex business logic, long‑running external calls, or inefficient processing slow down consumption.

Producer‑Side Mitigation Strategies

Rate limiting at the entry point – Apply throttling in API gateways or HTTP servers. Example: limit QPS per client IP.

Token‑bucket / leaky‑bucket algorithms – Implement in the producer library to cap the number of messages sent per second.

Back‑pressure propagation – When downstream services reject requests, pause or slow the producer.

Batching – Accumulate records locally and send them in larger batches (e.g., producer.send(record) inside a List<ProducerRecord> and call producer.flush()).

Consumer‑Side Optimization Techniques

Streamline business logic – Remove unnecessary calculations, cache static data, and avoid repeated deserialization.

Asynchronous processing – Offload I/O‑bound work to a separate thread pool or an in‑memory queue. Example:

ExecutorService executor = Executors.newFixedThreadPool(8);
consumer.poll(Duration.ofMillis(100)).forEach(record -> {
    executor.submit(() -> process(record));
});

Batch processing – Group multiple records before invoking downstream services. Example:

List<Record> batch = new ArrayList<>();
for (ConsumerRecord<String, String> r : records) {
    batch.add(r);
    if (batch.size() == 500) {
        bulkInsert(batch);
        batch.clear();
    }
}

Scale consumer instances – Increase the number of consumers in the same consumer group so that each instance handles one or more partitions, thereby raising overall parallelism.

Adjust consumer configuration – Tune max.poll.records, fetch.min.bytes, and fetch.max.wait.ms to balance latency and throughput.

Closed‑Loop Solution

Combine upstream throttling with downstream scaling: control the ingress rate to keep it within the cluster’s sustainable throughput, and simultaneously improve consumer processing capacity. Monitoring key metrics such as KafkaLag, BytesInPerSec, and consumer processing latency helps detect imbalance early and trigger the appropriate mitigation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Batch Processing High Concurrency rate limiting consumer optimization producer throttling

Written by

mikechen

Over a decade of BAT architecture experience, shared generously!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.