Databases 10 min read

Performance Optimization of ClickHouse: Identifying and Fixing High CPU Usage in BgMoveProcPool Threads

By adding a guard that skips the costly part‑scan in MergeTreePartsMover::selectPartsForMove when disk usage is below the threshold and no MoveTTL is set, Didi reduced BgMoveProcPool thread CPU consumption from about 30 % to under 4 %, halving overall node CPU load and improving ClickHouse’s PB‑scale performance.

Didi Tech

Jul 5, 2023

Performance Optimization of ClickHouse: Identifying and Fixing High CPU Usage in BgMoveProcPool Threads

ClickHouse is an open‑source, high‑performance columnar distributed database designed for real‑time analytics. Didi adopted ClickHouse in 2020 to serve core ride‑hailing and log‑search services, operating a cluster of over 300 nodes with daily petabyte‑scale writes and tens of millions of queries. This article describes a performance‑optimization case that reduced excessive CPU consumption.

Problem Discovery

Online nodes showed high CPU load. Using the top command and top -Hp <pid> to inspect threads, the most CPU‑intensive threads were identified as BackgrProcPool , HTTPHandler , several BgMoveProcPool threads, and Zookeeper communication threads. The BgMoveProcPool threads, responsible for moving parts of ReplicatedMergeTree tables, consistently occupied over 30% CPU despite no apparent disk pressure.

The df -h command showed all disks at roughly 50% usage, contradicting the expectation that a disk usage >90% would trigger moves.

Problem Confirmation

Stack traces captured with pstack <pid> revealed that the BgMoveProcPool threads were stuck inside MergeTreePartsMover::selectPartsForMove. Repeated SQL queries on system.part_log for recent MovePart events returned no rows, indicating that the move logic was being executed but never finding eligible parts.

#0  0x00000000100111a4 in DB::MergeTreePartsMover::selectPartsForMove(...)</code><code>#1  0x000000000ff6ef5a in DB::MergeTreeData::selectPartsForMove()</code><code>#2  0x000000000ff86096 in DB::MergeTreeData::selectPartsAndMove()</code><code>#3  0x000000000fe5d102 in DB::StorageReplicatedMergeTree::startBackgroundMovesIfNeeded::<lambda()#1>::operator()()</code><code>#4  0x000000000ff269df in DB::BackgroundProcessingPool::workLoopFunc()</code><code>#5  0x000000000ff272cf in ThreadFromGlobalPool...

Inspecting the source of selectPartsForMove showed three main steps: (1) collect disks whose usage exceeds a threshold, (2) iterate over all parts to decide which need moving or are candidates, and (3) reserve space for candidate parts. The second step was the bottleneck, scanning over 300,000 parts (the largest table had >60,000 parts).

bool MergeTreePartsMover::selectPartsForMove(MergeTreeMovingParts & parts_to_move, const AllowedMovingPredicate & can_move, const std::lock_guard<std::mutex> & /* moving_parts_lock */) {</code><code>    std::unordered_map<DiskPtr, LargestPartsWithRequiredSize> need_to_move;</code><code>    // 1. Find disks over the move factor threshold</code><code>    // 2. Iterate all parts, check TTL and move eligibility</code><code>    // 3. Reserve space for selected parts</code><code>}

Solution

Since the cluster never reached the 90% disk‑usage threshold and no MoveTTL was configured, the move logic could be short‑circuited. Adding a guard at the beginning of selectPartsForMove skips the expensive part‑scan when both conditions are false:

if (need_to_move.empty() && !metadata_snapshot->hasAnyMoveTTL())</code><code>    return false;

Actual Effect

After deploying the change to the public cluster, the BgMoveProcPool threads disappeared from the top‑CPU list; their combined CPU usage dropped from ~30% to below 4%.

Overall node CPU utilization fell from around 20% to 10%, and peak spikes were significantly reduced.

The optimization was contributed back to the ClickHouse community and merged into the master branch.

Future Considerations

Code that performs well on small data sets can become a bottleneck at scale. Developers should continuously evaluate the impact of each line of code under realistic workloads to build robust, high‑throughput systems. ClickHouse will keep advancing its log‑search capabilities to support stable, low‑cost, PB‑scale analytics.

SQL Linux ClickHouse distributed database CPU BgMoveProcPool

Written by

Didi Tech

Official Didi technology account

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.