Operations 14 min read

Evolution of Load Balancing Strategies in JD Advertising Online Model System

This article examines the progression of load‑balancing techniques used in JD's advertising online model system, analyzing current challenges, outlining requirements, reviewing static and dynamic strategies, and presenting a multi‑objective, hierarchical approach that improves service availability, resource utilization, and overall system stability.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Evolution of Load Balancing Strategies in JD Advertising Online Model System

Load balancing is a critical topic for distributed service architectures, essential for improving resource utilization and service stability in online clusters. This paper starts from the evolution of JD advertising online model system's load‑balancing strategies and focuses on optimal compute scheduling for heterogeneous hardware clusters.

Background

Complex business systems depend heavily on distributed service clusters.

Containerized deployment of heterogeneous nodes leads to performance imbalance.

Hardware component failure rates are unavoidable, requiring fault‑tolerant design.

Traffic spikes during promotions demand a balance between stability and resource cost.

Problems

Load imbalance results in low overall resource utilization.

Single‑node overload can trigger cluster expansion.

Node hardware failures affect overall service availability.

Unpredictable traffic changes cause stability issues.

Requirements

Design a reasonable load‑balancing (LB) strategy to improve resource utilization and service stability, effectively handling complex, variable traffic during large promotions.

Theoretical Foundations

Load balancing can be static (pre‑determined) or dynamic (runtime measured). The goal is to map tasks to machines to minimize execution time.

Load‑Balancing Strategy Summary

Distributed Strategies : Neighbor exchange methods such as diffusion, dimension exchange (DEM), and gradient method (GM).

Centralized Strategies : A designated processor collects global load information and makes balancing decisions.

Hybrid/Hierarchical Strategies : Use hierarchical trees to perform multi‑level balancing across groups of processors.

Algorithm Levels

System‑level LB : DNS load balancing, Nginx reverse‑proxy load balancing, LVS/F5 combined with Nginx.

Application‑level LB : Ribbon (client‑side) and Dubbo (service‑side) with strategies such as random, round‑robin, least‑connections, and locality‑aware.

Evolution Steps

Step 1 – Business‑Specific Adaptation : Consistent‑hash based on user PIN to maintain cache hit rate.

Step 2 – Availability Target : Introduce real‑time node availability metrics; nodes below average availability reduce traffic share, while the whole cluster can trigger degradation protection.

Step 3 – Heterogeneous Hardware Utilization : Add CPU/GPU utilization as secondary objectives, using a two‑level feedback loop to gradually converge resource usage.

Step 4 – Unified LB Framework : Modularize LB logic to unify internal and external services, eliminating isolated compute islands.

Effect Demonstration

During the 2022 618 promotion, the model‑estimation service cluster achieved over 10% improvement in machine resource utilization. Subsequent deployments in later promotions yielded 15%–20% gains and reduced CPU load variance by half.

References

Wang G, Zhang L, Xu W. What Can We Learn from Four Years of Data Center Hardware Failures. IEEE DSN 2017.

Yang JX et al. Survey of Dynamic Load Balancing Strategies for Parallel and Distributed Computing. J. Electronics 2010.

Mirrokni V, Thorup M, Zadimoghaddam M. Consistent Hashing with Bounded Loads. 2016.

https://developer.aliyun.com/article/1325514.

distributed systemsload balancingresource utilizationservice availabilityDynamic Scheduling
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.