Databases 10 min read

How Didi Cut ClickHouse CPU Usage by 90%: Optimizing BgMoveProcPool Threads

This article details how Didi identified excessive CPU consumption by ClickHouse's BgMoveProcPool threads, traced the root cause to unnecessary part‑move checks, introduced a simple early‑exit guard in selectPartsForMove, and achieved a dramatic reduction in CPU load while contributing the fix upstream.

dbaplus Community

Nov 14, 2023

How Didi Cut ClickHouse CPU Usage by 90%: Optimizing BgMoveProcPool Threads

1. Discovering the Issue

ClickHouse is an open‑source, high‑performance columnar database used by Didi since 2020 for ride‑hailing and log search, with over 300 nodes handling petabytes of data daily. Monitoring showed unusually high CPU usage on several background threads.

Using top and top -Hp <pid> the most CPU‑intensive threads were identified:

BackgrProcPool – handles ReplicatedMergeTree merges and mutations.

HTTPHandler – processes HTTP requests, query parsing, and plan generation.

Six BackgrProcPool threads (≈30% CPU each) moving data between disks when disk usage exceeds a threshold (default 90%).

ZookeeperSend / ZookeeperRecv – communicate with ZooKeeper for replica synchronization.

Although disk usage was only around 50% on all 12 disks, the BgMoveProcPool threads still consumed CPU, prompting further investigation.

CPU usage by threads before optimization

2. Confirming the Issue

Stack traces captured with pstack <pid> showed BgMoveProcPool threads stuck in MergeTreePartsMover::selectPartsForMove:

#0  0x00000000100111a4 in DB::MergeTreePartsMover::selectPartsForMove(...)
#1  0x000000000ff6ef5a in DB::MergeTreeData::selectPartsForMove()
#2  0x000000000ff86096 in DB::MergeTreeData::selectPartsAndMove()
#3  ... (additional stack frames omitted for brevity)

Repeated captures confirmed the method was the bottleneck. Querying system.part_log for recent MovePart events returned no rows, indicating the threads were performing futile checks.

SELECT * FROM system.part_log WHERE event_time > now() - toIntervalDay(1) AND event_type = 'MovePart'

The source code of selectPartsForMove was examined. The method performs three major steps:

bool MergeTreePartsMover::selectPartsForMove(MergeTreeMovingParts & parts_to_move, const AllowedMovingPredicate & can_move, const std::lock_guard<std::mutex> & /* moving_parts_lock */) {
    std::unordered_map<DiskPtr, LargestPartsWithRequiredSize> need_to_move;
    // 1. Scan all disks; add disks whose usage exceeds the move factor (default 0.9) to need_to_move.
    // 2. Scan all parts; if a part's MoveTTL has expired, add it to parts_to_move, otherwise add candidate parts for disks in need_to_move.
    // 3. For candidate parts, reserve space and move them.
    ...
}

Step 2, which iterates over every part to evaluate move eligibility, dominates the runtime. In Didi’s cluster there were over 300 000 parts, with the largest table containing more than 60 000 parts, explaining the high CPU consumption.

3. Solving the Problem

Because the cluster’s disks were far from the 90 % threshold and no MoveTTL was configured, the move logic was unnecessary. A guard was added at the start of the method to skip the entire scan when both conditions are false:

if (need_to_move.empty() && !metadata_snapshot->hasAnyMoveTTL())
    return false;

This early‑exit prevents the expensive part‑enumeration when there is nothing to move.

4. Actual Effect

After deploying the change to the public cluster, the eight BgMoveProcPool threads dropped from about 30 % total CPU to under 4 %. Overall node CPU usage fell from roughly 20 % to 10 %, and peak spikes were significantly reduced.

The fix was contributed back to the ClickHouse community and merged into the master branch.

5. Future Thoughts

Performance problems often surface only under large data volumes and high concurrency. Developers should treat every line of code with caution, ensuring robustness as workloads grow. ClickHouse will continue to evolve its log‑search capabilities to deliver stable, low‑cost, high‑throughput PB‑scale retrieval.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization ClickHouse CPU Background Threads

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.