Operations 9 min read

Root Cause Analysis and Optimization of High Load on Alibaba Cloud RocketMQ Consumer Service

The article investigates why a RocketMQ consumer service running on a 4‑core ECS experiences sustained high load despite low CPU, I/O and memory usage, identifies excessive thread creation and frequent trace‑module context switches as the main causes, and proposes configuration and SDK upgrades to resolve the issue.

DevOps Operations Practice
DevOps Operations Practice
DevOps Operations Practice
Root Cause Analysis and Optimization of High Load on Alibaba Cloud RocketMQ Consumer Service

Background: a service handling MQ messages using Alibaba Cloud RocketMQ SDK 1.2.6 experiences high load on its ECS (4‑core, 8 GB) as consumer count grows beyond 200, causing sustained load spikes despite low CPU, I/O and memory usage.

Load analysis shows load_15m and load_5m staying between 3‑5, while load_1m frequently exceeds the number of cores, indicating intermittent congestion.

Investigation using vmstat and pidstat reveals frequent context switches and interrupts; CPU usage remains low but the system spends much time handling thread scheduling.

ECS配置:4核8G
物理cpu个数=4
单个物理CPU中核(core)的个数=1
单核多处理器

Further inspection shows thousands of Java threads (≈9700) and many threads performing >100 context switches per second, especially those belonging to the RocketMQ consumer.

tips:系统load高,不代表cpu资源不足。Load高只是代表需要运行的队列累计过多。但队列中的任务实际可能是耗cpu的,也可能是耗i/0及其他因素的。

Root cause identification:

Excessive consumer threads: each consumer creates a thread pool with default core size 20 and max 64; with 200+ consumers this leads to tens of thousands of threads, most idle.

Trace reporting module (AsyncArrayDispatcher) uses a bounded ArrayBlockingQueue; its poll(5 ms) call causes threads to repeatedly block and unblock, generating many context switches.

traceContextQueue.poll(5,TimeUnit.MILLISECONDS);

Code analysis shows the trace queue is an ArrayBlockingQueue backed by a ReentrantLock (non‑fair), and blocking is implemented via unsafe.park, which wakes on timeout, interrupt, or unpark.

ArrayBlockingQueue uses non‑fair lock; park blocks thread until one of four conditions occurs.

Optimization proposals:

Configure each consumer’s consumeThreadMin/consumeThreadMax to appropriate values to reduce total thread count.

Upgrade to RocketMQ SDK 1.8.5, which adds a switch to use a single trace‑dispatch thread and a single bounded queue for all consumers.

Applying these changes should lower context‑switch overhead, reduce load, and improve overall system stability.

JavaoperationsPerformance TuningrocketmqThread ManagementLoad Analysis
DevOps Operations Practice
Written by

DevOps Operations Practice

We share professional insights on cloud-native, DevOps & operations, Kubernetes, observability & monitoring, and Linux systems.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.