Big Data 10 min read

Analysis of Druid's Balance Strategy and Cost Computation

The article examines Druid’s cost‑based balance strategy introduced in version 0.12.0, explaining how the ‘cost’ algorithm randomly selects a segment, computes a pairwise exponential cost for each Historical node using a closed‑form integral, and moves the segment to the node with minimal cost to improve query parallelism.

Youzan Coder
Youzan Coder
Youzan Coder
Analysis of Druid's Balance Strategy and Cost Computation

Druid queries require both real‑time and historical segments. The distribution of historical segments across Historical nodes directly impacts query response time. This article analyzes Druid's Balance strategy, focusing on the cost‑based algorithm introduced in version 0.12.0.

Three balance algorithms exist in Druid: cachingCost , diskNormalized , and cost . The article concentrates on the cost algorithm, which selects a random segment, computes a cost for moving it to each Historical node, and chooses the node with the minimal cost.

Key configuration parameters (example):

druid.coordinator.startDelay=PT30S
 druid.coordinator.period=PT30S
 druid.coordinator.balancer.strategy=cost
 maxSegmentsToMove=5

The core class DruidCoordinatorBalancer implements the balance loop. A simplified excerpt:

@Override
public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params) {
    // iterate tiers and invoke balanceTier(...)
    return params.buildFromExisting().withCoordinatorStats(stats).build();
}

The balanceTier method selects a segment via the ReservoirSegmentSampler , then finds the best Historical node using CostBalancerStrategy . The cost is computed by summing pairwise costs between the candidate segment and all segments already on the node, plus adjustments for segments to be loaded or dropped.

private void balanceTier(DruidCoordinatorRuntimeParams params, String tier,
    SortedSet
servers, CoordinatorStats stats) {
    // pick a segment, compute its cost on each server, move if beneficial
}

The cost function is defined as

Cost(X, Y) = ∫∫ e^{λ|x‑y|} dx dy

where λ = log₂(e)/24. The implementation reduces the double integral to a closed‑form expression:

return (ey1 - ey0) - (exy1 - exy0);

Further analysis shows that when two 1‑hour segments are separated by t hours, the cost simplifies to

Cost = (2^{1/24} + 2^{-1/24} - 2) * 2^{-t/24}

Thus, larger time gaps yield smaller costs, encouraging Druid to spread adjacent segments across different Historical nodes to improve parallel query processing.

The article concludes that Druid's balance mechanism optimizes segment placement for faster queries and briefly mentions the Youzan big‑data team.

Javabig dataDruidBalance Strategycost functionSegment Distribution
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.