Big Data 10 min read

Analysis of Druid's Balance Strategy and Cost Computation

The article examines Druid’s cost‑based balance strategy introduced in version 0.12.0, explaining how the ‘cost’ algorithm randomly selects a segment, computes a pairwise exponential cost for each Historical node using a closed‑form integral, and moves the segment to the node with minimal cost to improve query parallelism.

Youzan Coder

Mar 6, 2019

Analysis of Druid's Balance Strategy and Cost Computation

Druid queries require both real‑time and historical segments. The distribution of historical segments across Historical nodes directly impacts query response time. This article analyzes Druid's Balance strategy, focusing on the cost‑based algorithm introduced in version 0.12.0.

Three balance algorithms exist in Druid: cachingCost, diskNormalized, and cost. The article concentrates on the cost algorithm, which selects a random segment, computes a cost for moving it to each Historical node, and chooses the node with the minimal cost.

Key configuration parameters (example):

druid.coordinator.startDelay=PT30S
 druid.coordinator.period=PT30S
 druid.coordinator.balancer.strategy=cost
 maxSegmentsToMove=5

The core class DruidCoordinatorBalancer implements the balance loop. A simplified excerpt:

@Override
public DruidCoordinatorRuntimeParams run(DruidCoordinatorRuntimeParams params) {
    // iterate tiers and invoke balanceTier(...)
    return params.buildFromExisting().withCoordinatorStats(stats).build();
}

The balanceTier method selects a segment via the ReservoirSegmentSampler, then finds the best Historical node using CostBalancerStrategy. The cost is computed by summing pairwise costs between the candidate segment and all segments already on the node, plus adjustments for segments to be loaded or dropped.

private void balanceTier(DruidCoordinatorRuntimeParams params, String tier,
    SortedSet<ServerHolder> servers, CoordinatorStats stats) {
    // pick a segment, compute its cost on each server, move if beneficial
}

The cost function is defined as Cost(X, Y) = ∫∫ e^{λ|x‑y|} dx dy where λ = log₂(e)/24. The implementation reduces the double integral to a closed‑form expression: return (ey1 - ey0) - (exy1 - exy0); Further analysis shows that when two 1‑hour segments are separated by t hours, the cost simplifies to Cost = (2^{1/24} + 2^{-1/24} - 2) * 2^{-t/24} Thus, larger time gaps yield smaller costs, encouraging Druid to spread adjacent segments across different Historical nodes to improve parallel query processing.

The article concludes that Druid's balance mechanism optimizes segment placement for faster queries and briefly mentions the Youzan big‑data team.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java Druid Balance Strategy cost function Segment Distribution

Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.