Why MultiTopicsConsumerImpl Slowed Down Pulsar and How We Boosted Its Throughput 4×

A Pulsar community expert investigated why MultiTopicsConsumerImpl delivered only a fraction of the expected throughput, identified lock contention and EventLoop overhead as the main culprits, applied lock‑removal and thread‑pool optimizations, and achieved nearly four‑fold performance gains.

Tencent Cloud Middleware
Tencent Cloud Middleware
Tencent Cloud Middleware
Why MultiTopicsConsumerImpl Slowed Down Pulsar and How We Boosted Its Throughput 4×

Background

The Pulsar community asked for help diagnosing a performance issue where the MultiTopicsConsumerImpl, which should aggregate multiple ConsumerImpl instances for a multi‑partition topic, performed worse than a single ConsumerImpl.

Problem Statement

Despite the expectation that parallel consumption across partitions would increase throughput, the observed throughput of MultiTopicsConsumerImpl was only about one‑seventh of the ConsumerImpl.

Test Setup

A three‑node Pulsar cluster was deployed on 8‑core, 16 GB VMs, with topics created with four partitions. The built‑in pulsar-perf tool was used to benchmark both implementations for a two‑minute consumption period.

bin/pulsar-perf consume -u 'http://x.x.x.x:8080' -s my-sub-6 -sp Earliest -q 100000 persistent://public/default/p-topic

Initial Findings

Performance numbers showed:

MultiTopicsConsumerImpl: 11,715,556 records, 68,813.420 msg/s, 537.605 Mbit/s
ConsumerImpl: 78,403,434 records, 462,640.204 msg/s, 3,614.377 Mbit/s

Flame graphs revealed that 40.65% of CPU time was spent in business threads, with 14% in MessageReceived and 8.22% in re‑entrant locks; overall lock contention accounted for ~20% of the time.

Optimization Steps

Replace custom locking with the thread‑safe BlockingQueue to eliminate redundant locks.

Reduce lock acquisition frequency by adding pre‑checks before attempting to lock.

Refactor logic to remove unnecessary locks entirely.

Where possible, substitute re‑entrant locks with read‑write locks for better concurrency.

Lock‑Removal Results

// before optimization
Aggregated throughput stats --- 11715556 records --- 68813.420 msg/s --- 537.605 Mbit/s
// after optimization
Aggregated throughput stats --- 25062077 records --- 161656.814 msg/s --- 1262.944 Mbit/s

EventLoop Optimization

Further profiling showed that Netty's EventLoopGroup consumed 12.63% of CPU time due to frequent system calls ( Native.eventFdWrite). Replacing the Netty EventLoop with a standard ThreadPoolExecutor using a BlockingQueue reduced this overhead.

// before EventLoop optimization
Aggregated throughput stats --- 11715556 records --- 68813.420 msg/s --- 537.605 Mbit/s
// after EventLoop optimization
Aggregated throughput stats --- 18392800 records --- 133314.602 msg/s --- 1041.520 Mbit/s

Final Performance

Combining lock removal and EventLoop replacement yielded a near‑four‑fold increase for MultiTopicsConsumerImpl:

// final results
MultiTopicsConsumerImpl before: 11,715,556 records, 68,813.420 msg/s, 537.605 Mbit/s
MultiTopicsConsumerImpl after: 40,140,549 records, 275,927.749 msg/s, 2,155.686 Mbit/s
ConsumerImpl (baseline): 78,403,434 records, 462,640.204 msg/s, 3,614.377 Mbit/s

Conclusion

The primary bottlenecks were lock contention and excessive EventLoop usage; eliminating unnecessary locks and switching to a more efficient thread pool dramatically improved throughput. Although the optimized MultiTopicsConsumerImpl still reaches only about 50% of the single ConsumerImpl’s performance, further architectural tweaks could close the gap.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

JavaPerformance TuningMessagingApache PulsarEventLooplock optimization
Tencent Cloud Middleware
Written by

Tencent Cloud Middleware

Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.