Cloud Computing 18 min read

Elastic Nearline Computing Architecture for Leveraging Idle Resources in Baidu's PaaS Platform

Baidu’s elastic nearline computing architecture inserts an asynchronous, resource‑adaptive layer between online and offline processing, dynamically harvesting idle CPU, GPU and Kunlun XPU capacity to pre‑compute complex recommendation and search policies, enabling peak‑shifting, valley‑filling, higher timeliness and significant business growth at low cost.

Baidu Geek Talk

Jan 27, 2021

Elastic Nearline Computing Architecture for Leveraging Idle Resources in Baidu's PaaS Platform

In production PaaS platforms, resource redundancy is maintained to handle traffic growth, load spikes, hardware/software upgrades, and large-scale partial failures. Baidu's information flow and search services achieve over five-nines availability, with each regional PaaS cluster designed with certain resource redundancy. To control costs, business iterations require a reasonable input-output ratio before full rollout. Leveraging the characteristics of recommendation and search systems, an elastic nearline computing architecture was designed and implemented, situated between online and offline computation.

Compared to online computing, it breaks through speed limits, providing larger space for business computational complexity. Compared to offline batch computing, it offers higher timeliness and stability for policy computation. The overall idea is to adopt asynchronous computation, decoupling complex policy computation from online computation. Online systems cache intermediate results or build independent nearline computation flows, triggered by signals to precompute intermediate results. Online services then use precomputed results for low-complexity processing, bypassing online speed limits and enabling high-complexity, large-volume business computation. When system redundancy exists, historical trigger signals are used for visit estimation, proactively initiating nearline computation to optimize resource utilization and improve business outcomes.

The system's core mechanisms include dynamic computing power and dynamic parameter adjustment based on PaaS cluster load to obtain excess computing power; designing load-related business parameters that match available computing power to smoothly and fully utilize resources; and resource usage estimation and planning, including peak‑shifting scheduling, load allocation according to resource conditions, and proactive nearline computation during non‑peak periods based on remaining computing power. The architecture supports heterogeneous resources such as CPU, GPU, and Kunlun XPU chips, and considers cross‑geographic scheduling of computing clusters nationwide.

The system comprises several subsystems: a trigger control system that manages nearline computation triggering for peak‑shifting and valley‑filling; a dynamic computing power and dynamic parameter subsystem (the “brain”) that allocates nearline computing power according to cluster resources and computes control parameters to match load with power; a historical data center that stores nearline computation records for reuse; a business nearline computation and control subsystem handling I/O caching, packetization, and failure feedback; and an online business access subsystem designed for low‑effort integration.

Power sources consist of an expansion resource pool made up of unallocated cluster resources and fragment resources that cannot satisfy business quotas, modeled as total cluster resources minus allocated resources plus reclaimable resources, with reclaimable amount dynamically determined by stable resource allocation and usage. XPU expansion resource excavation and utilization are achieved through a model‑scheduling system built on PaaS, featuring model containers, extended K8s Operator for instance partitioning, resource‑pre‑estimation subsystems, and name‑service based model‑to‑instance mapping, ensuring XPU sharing does not interfere with online business latency.

To further improve resource utilization, the system employs load and power intelligent scheduling: it balances load across machines in response to local resource hotspots caused by mixed‑layout environments, performs dynamic load balancing across upstream/downstream modules, anticipates load peaks to pre‑allocate resources and shift scheduling, and learns resource coefficients for heterogeneous CPU/XPU scheduling.

Historical computation results are leveraged: when resources are abundant, high‑preference historical items are recomputed to expand candidate sets and boost business effectiveness; when resources are scarce, irrelevant historical items are filtered to increase efficiency. Additionally, visit prediction based on historical access times improves result timeliness, and indexed historical trigger signals enable prioritized replay for scenarios such as push notifications.

Typical applications include a feed online‑nearline mixed computation architecture where recommendation algorithm outputs are cached and jointly scored in the nearline system to generate a user‑specific candidate set, increasing recall‑stage scoring scale by an order of magnitude; and search result sorting where Transformer‑based relevance models use mixed CPU/XPU resources, with an aggressive peak‑shifting/valley‑filling mechanism that caches trigger signals in a message queue during shortages and processes them during low‑load periods.

Since its 2018 deployment, the elastic nearline computing system has supported heterogeneous computing power (CPU, GPU, Kunlun chips) in Baidu’s feed recommendation and search relevance businesses, powering over ten feed scenarios with double‑digit growth in total watch time and distribution, and three search scenarios that significantly improve video search, Q&A search relevance and user experience. As more complex models and multi‑target model stacking are added, the system will continue to deliver low‑cost support for business growth.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Cloud Computing search relevance feed recommendation heterogeneous computing elastic nearline computing historical result reuse PaaS resource scheduling Peak Shaving

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.