How Baidu Feed Scales Millions of Users with Serverless: A Multi‑Dimensional Elasticity Blueprint
This article details Baidu Feed's serverless transformation, describing how multi‑dimensional service profiling (elasticity, traffic, capacity) and three elastic strategies—predictive, load‑feedback, and timed—enable automatic scaling that reduces resource waste while maintaining 24/7 stability for billions of users.
Background
In Baidu’s cloud‑native environment, most products (search, recommendation, advertising) are built from a large number of micro‑services. These services are computation‑heavy, run 24/7 and are usually provisioned with a fixed capacity. Fixed provisioning leads to waste during traffic troughs and cannot react quickly enough to traffic spikes because instance startup – especially the download and extraction of large dictionary files – dominates scaling latency.
Problem Statement
Baidu Feed, a recommendation system serving billions of users, exhibits strong diurnal traffic peaks and valleys. Using a static capacity for the whole day wastes resources in low‑traffic periods and risks overload during spikes because the time to start a new instance (downloading and extracting dictionaries) is on the order of minutes.
Service Profiling
Each backend service is described by three orthogonal profiles stored in the cloud‑native control plane (PaaS for scaling actions, ALM for data management):
Elasticity profile : classifies a service as high , medium or low elasticity based on statefulness, deployment time, instance quota (CPU, memory, disk) and external dependencies.
Traffic profile : uses historical CPU usage (as a proxy for request volume) to predict future traffic. A day is divided into configurable time‑slices (e.g., 1 h). For each slice the median CPU usage over the past N days is computed and smoothed with a median‑based filter.
Capacity profile : defines the required CPU buffer. The peak CPU utilization observed in a slice is recorded; the buffer is the difference between 100 % and the peak (e.g., a 60 % peak implies a 40 % buffer).
Elasticity Strategies
Three complementary strategies are applied, with the priority order timed > predictive > load‑feedback :
Predictive elasticity : uses the traffic profile to forecast the next slice and pre‑scale the service before the load arrives.
Load‑feedback elasticity : continuously monitors real‑time CPU usage (and optional custom metrics such as latency) and expands capacity when a threshold is crossed. Shrinking is delegated to the other two strategies.
Timed elasticity : schedules scaling actions around known high‑peak periods (e.g., morning and evening peaks) and keeps capacity constant during stable intervals.
Implementation Details
The scaling workflow consists of the following steps:
PaaS container initialization : PaaS selects a machine that satisfies the instance quota (CPU, memory, disk) and creates a container.
Binary and dictionary preparation : Service binaries and large dictionary files are downloaded from remote storage. Dictionaries larger than a threshold are placed on a shared cloud disk (or cloud‑disk volume) so that they can be mounted instead of copied, reducing startup time.
Instance launch and registration : The container runs the service start‑up script, registers the instance with service discovery, and begins reporting metrics.
Scaling decisions are computed as follows:
# Pseudocode for scaling decision
prev = max_cpu(prev_slice, N_days)
cur = max_cpu(cur_slice, N_days)
next = max_cpu(next_slice, N_days)
# Determine traffic case
if prev < cur < next:
target_traffic = next # case‑1: rising trend
elif prev > cur < next:
target_traffic = next # case‑2: valley‑to‑peak
elif prev < cur > next:
target_traffic = cur # case‑3: peak, keep current
else:
target_traffic = cur # case‑4: falling trend, shrink later
# Apply growth factor based on historical max growth rate
growth = max_growth_rate(cur_slice, N_days)
target_traffic = max(target_traffic, cur * growth)
# Convert traffic to capacity using the service’s capacity profile
required_cpu = target_traffic * (1 + buffer_ratio)
instance_count = ceil(required_cpu / instance_quota_cpu)
# Issue scaling request to PaaS
paas.scale(service_id, instance_count)Stability Safeguards
To keep large‑scale dynamic scaling reliable, Baidu employs four mechanisms:
Elasticity inspection : periodically forces instance migrations to verify that the start‑up pipeline (especially dictionary loading) works under load.
Capacity inspection : monitors resource usage against the capacity profile and raises alerts when limits are approached.
Status inspection : checks that the service’s capacity state matches the expected state for the current time slice (peak vs. off‑peak).
One‑click intervention : provides an emergency rollback that can instantly revert a service to its previous instance count.
Results
The serverless elasticity framework has been deployed to more than 100 000 service instances in the Baidu Feed line. By matching capacity to traffic in real time, operational cost is reduced while maintaining 24/7 availability.
Future Work
Planned enhancements include:
Capacity guarantees for hot events (e.g., viral content spikes) by reserving burst buffers.
Applying machine‑learning models to improve traffic‑profile prediction accuracy.
Extending the approach to an even larger set of services across additional Baidu product lines.
Key Architectural Diagram
Resource Consumption Comparison
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Baidu Tech Salon
Baidu Tech Salon, organized by Baidu's Technology Management Department, is a monthly offline event that shares cutting‑edge tech trends from Baidu and the industry, providing a free platform for mid‑to‑senior engineers to exchange ideas.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
