Cloud Native 12 min read

Baidu Search Service Compute Management: Architecture, Practices, and Cloud‑Native Techniques

Baidu’s Search Architecture team evolved compute management from static physical‑machine deployment to cloud‑native mixed deployment, employing a governance system that combines elastic containers, Service Mesh, and committee‑driven operations, while leveraging tidal scaling, container tiering, performance‑curve‑based VPA/HPA, and fine‑grained traffic scheduling to deliver cost‑effective, flexible, and performance‑driven resource allocation across both temporal and task‑type dimensions.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Baidu Search Service Compute Management: Architecture, Practices, and Cloud‑Native Techniques

This article introduces the engineering practice and experience of Baidu Search Architecture team in the field of service compute management, focusing on the "time" and "space" aspects of service and container compute resources.

1. Overview of Search Service Compute Management

1.1 Development Stages

The core goal of compute management is the optimal matching of compute demand and resources. The evolution can be divided into three stages:

Physical‑machine deployment stage (pre‑2014) : Services were directly deployed on physical machines with relatively uniform CPU, memory, and SSD capacities. Offline frameworks such as BVC performed mixed‑deployment during low‑traffic periods.

Semi‑automatic mixed deployment stage (2014‑2018) : A self‑developed PaaS provided semi‑automatic mixed deployment. Policies were manually designed, execution was command‑driven, and capacity was manually maintained. Capacity specialists were consulted for each project, and full‑link pressure testing was used for large projects.

Cloud‑native mixed deployment stage (post‑2018) : Services are hosted on Baidu’s cloud platform, enabling automatic scaling and mixed deployment. Projects no longer need capacity specialists; they submit expansion requests directly to the cloud platform, and traffic is matched to faster containers for cost‑effective performance.

1.2 Governance System

The governance system consists of two parts: technical architecture and operation system.

Technical architecture : Leverages elastic container instances, intelligent monitoring, Service Mesh, and other cloud‑native products to achieve quota‑based intelligent management.

Operation system : A committee‑driven process optimizes cost and speed, and manages project budget approvals to achieve global cost control.

2. Business Characteristics of Compute Management

2.1 Full‑time‑period Optimization

Traffic exhibits tidal patterns. During low‑traffic periods, resources are under‑utilized, prompting cloud‑native strategies such as:

Tidal scaling : Reduce instance count at night and expand before daytime peaks, lowering night‑time bills.

Window migration : Predict cache expiration and run compute tasks in cheap night‑time containers, delivering fresher results during the next day’s peak.

2.2 Same‑time‑period Compute Space Exploration

Within the same time slot, tasks can be classified into near‑line, asynchronous, and synchronous, each with distinct resource strategies:

Near‑line tasks : Results are cached; cheaper pre‑emptible containers can be used.

Asynchronous tasks : Cache updates can be delayed; stable but slower containers are acceptable.

Synchronous tasks : Immediate response required; high‑quota, fast containers are allocated for tail‑heavy shards.

3. Key Technologies

3.1 Application Performance Management

Performance curves : Large‑scale sample aggregation links traffic, latency, and resource usage to derive statistically meaningful performance curves.

VPA/HPA : Use performance curves to predict ideal quota for each task type and trigger scaling events via PaaS.

Traffic throttling : After optimization, use Service Mesh to cut traffic and shrink resources more efficiently than traditional batch scaling.

3.2 Container Tiering

Static tiering : Assign faster containers to latency‑critical downstream services.

Dynamic tiering : Adjust container tier based on host hardware state; idle hosts run faster containers.

3.3 Compute Distribution

Fine‑grained traffic scheduling : Separate high‑priority and low‑priority traffic, routing high‑priority requests to fast containers.

Intelligent cache pre‑warming : Predict cache invalidation, compute updates during low‑traffic windows, and serve fresher data during peaks.

4. Conclusion

The article reviews the evolution of search service compute management, outlines the combined technical‑operational governance, and highlights recent cloud‑native business features that address both temporal (tide) and spatial (task‑type) dimensions. By leveraging cloud‑native technologies, Baidu achieves more flexible, cost‑effective, and performance‑driven compute allocation.

Cloud NativePerformance Optimizationservice scalingContainer OrchestrationBaidu Searchcompute management
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.