Meituan Second Interview: Solving High RocketMQ Consumption Latency in Production

During a Meituan second-round interview, the candidate explains how to diagnose and resolve high RocketMQ consumption latency in production, outlining a three‑step approach—stop the bleeding, recover service, and cure the root cause—through consumer scaling, message forwarding, logic optimization, concurrency tuning, queue expansion, and monitoring.

Java Architect Handbook
Java Architect Handbook
Java Architect Handbook
Meituan Second Interview: Solving High RocketMQ Consumption Latency in Production

Interview Assessment Points

Problem Diagnosis Ability : Quickly locate the root cause of message backlog—whether it is slow consumption, overly aggressive production, or broker issues.

Emergency Handling Experience : Demonstrates real‑world production experience to apply correct emergency measures.

Architecture Optimization Thinking : Shows ability to design preventive measures such as reasonable consumer concurrency, appropriate messaging models, and proper monitoring/alerting.

Core Answer

First stop the bleeding, then recover, and finally cure.

Two layers: Emergency handling and Fundamental remediation .

Emergency handling – Scale consumers : Increase the number of consumer instances or threads.

Emergency handling – Message forwarding : Forward backlog messages to a new Topic and consume them with an independent consumer group.

Emergency handling – Temporary discard : Drop part of historical messages for non‑core business.

Fundamental remediation – Optimize consumption logic : Reduce per‑message processing time.

Fundamental remediation – Reasonable concurrency design : Adjust consumeThreadMin / consumeThreadMax.

Fundamental remediation – Monitoring & alerting : Establish backlog monitoring and early warning thresholds.

Deep Analysis

1. Common Causes of Message Backlog

Consumer‑side capacity shortage is the most common cause, accounting for over 80% of cases. Specific factors include:

Slow consumption logic : Synchronous DB calls, long‑running RPCs, or complex business logic.

Consumer exceptions : Bugs causing repeated retries, frequent restarts, or Full GC pauses.

Insufficient concurrency : Too few consumer instances or low thread‑pool settings.

Order‑message bottleneck : A single thread per MessageQueue limits throughput when using ordered messages.

2. Emergency Handling Solutions

Solution 1 – Scale Consumers (Fastest Effect)

// Increase consumer instances (deploy more nodes)
// Note: the number of consumer instances must not exceed the number of Topic Queues
// Example: a Topic with 16 Queues can have at most 16 consumer instances
DefaultMQPushConsumer consumer = new DefaultMQPushConsumer("consumer_group");
consumer.setConsumeThreadMin(20); // default 20
consumer.setConsumeThreadMax(100); // default 20

Principle : Boost parallel processing capacity on the consumer side.

Attention : Consumer instances cannot exceed the number of MessageQueues; excess instances will stay idle.

Applicable Scenario : Consumption logic is not inherently slow, but concurrency is insufficient.

Solution 2 – Forward Messages to a New Topic (Massive Backlog)

// Step 1: Write a temporary consumer to quickly consume backlog and forward to a new Topic
DefaultMQPushConsumer tempConsumer = new DefaultMQPushConsumer("temp_group");
tempConsumer.subscribe("original_topic", "*");
tempConsumer.registerMessageListener(new MessageListenerConcurrently() {
    @Override
    public ConsumeConcurrentlyStatus consumeMessage(List<MessageExt> msgs, ConsumeConcurrentlyContext context) {
        for (MessageExt msg : msgs) {
            // Light‑weight forwarding without business processing
            Message newMsg = new Message("new_topic", msg.getBody());
            producer.send(newMsg);
        }
        return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
    }
});

Why it works : The original Topic backlog is cleared quickly, restoring the normal production‑consumption chain; the new Topic can be consumed at its own pace.

Key Point : Forwarding logic must be lightweight—pure "搬运" without any business processing.

Applicable Scenario : Backlog reaches millions and consumption logic is genuinely slow.

Solution 3 – Temporarily Discard Non‑Core Messages

// For non‑core business, skip historical messages and start from the latest offset
consumer.setConsumeFromWhere(ConsumeFromWhere.CONSUME_FROM_LAST_OFFSET);

// Or set message expiration time on the broker side (e.g., fileReservedTime = 48 hours)

Applicable Scenario : Log‑type or monitoring‑type workloads that are not sensitive to historical data.

Attention : Core business must never use this method, as it leads to data loss.

3. Fundamental Remediation Solutions

3.1 Optimize Consumption Logic

// Negative example: synchronous calls to multiple external services
MessageListenerConcurrently listener = (msgs, context) -> {
    for (MessageExt msg : msgs) {
        userService.getUser(msg.getUserId());      // RPC 200ms
        orderService.getOrder(msg.getOrderId());   // RPC 300ms
        inventoryService.deduct(msg.getSkuId());   // RPC 150ms
    }
    // Single message processing time ~650ms → easy backlog
    return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
};

// Positive example: batch operations + async processing
MessageListenerConcurrently listener = (msgs, context) -> {
    // Batch query to reduce RPC count
    List<Long> userIds = msgs.stream().map(m -> parseUserId(m)).collect(Collectors.toList());
    Map<Long, User> userMap = userService.batchGetUsers(userIds); // one batch query

    // Local cache for hot data
    // Async handling of non‑core logic
    CompletableFuture.runAsync(() -> {
        notifyService.sendNotification(msgs); // non‑core logic async
    }, asyncExecutor);
    return ConsumeConcurrentlyStatus.CONSUME_SUCCESS;
};

3.2 Reasonable Concurrency Settings

consumeThreadMin : default 20, adjust according to business.

consumeThreadMax : default 20, suggested 50‑200.

pullBatchSize : default 32, suggested 32‑100.

consumeMessageBatchMaxSize : default 1, suggested 10‑50.

3.3 Avoid Order‑Message Bottleneck

Increasing the number of Queues for a Topic raises parallelism. For example, expanding from 4 Queues (4‑thread concurrency) to 16 Queues enables 16‑thread concurrency.

3.4 Establish Monitoring & Alerting

// Monitor backlog with MQAdmin tool
// Command: mqadmin consumerProgress -g consumer_group

// Core monitoring metrics:
// 1. Diff (backlog): Consumer Offset vs. Broker Offset
// 2. Consumption TPS
// 3. Consumption latency (time from production to consumption)

// Suggested alert thresholds (adjust per SLA):
// - Diff > 10,000 → P2 alert
// - Diff > 100,000 → P1 alert
// - Latency > 5 minutes → P0 alert

High‑Frequency Follow‑Up Questions

Relationship between consumer instances and MessageQueue count : When instances ≤ queues, each instance gets at least one Queue; excess instances remain idle.

Serious consequences of backlog : Disk fill‑up causing broker write rejection, message expiration leading to data loss, and delayed consumption breaking business timeliness.

How to view RocketMQ backlog : Use mqadmin consumerProgress to see each consumer group's Diff, or monitor Consumer Lag on the Dashboard.

Does broadcast consumption cause backlog? : In broadcast mode each consumer processes all messages, so backlog depends solely on a single consumer's capacity; scaling consumers does not share the load.

Summary

The core of handling message backlog is "first stop the bleeding, then cure". Emergency measures (scale consumers, forward messages, discard non‑core data) quickly restore service. Long‑term remediation (optimize consumption logic, tune concurrency, increase Queue count, monitor alerts) eliminates the root cause. Over 80% of backlogs stem from the consumer side, so developers should pay special attention to consumption performance in daily development.

JavaPerformance TuningMessage QueueRocketMQInterview preparationConsumer Lag
Java Architect Handbook
Written by

Java Architect Handbook

Focused on Java interview questions and practical article sharing, covering algorithms, databases, Spring Boot, microservices, high concurrency, JVM, Docker containers, and ELK-related knowledge. Looking forward to progressing together with you.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.