Industry Insights 11 min read

Baidu’s QCon 2021 Highlights: Elastic Scaling, Search Architecture, AI Chips

This article compiles Baidu engineers' QCon 2021 talks, covering micro‑service evolution, large‑scale container elastic scaling, search system elasticity, AI‑chip deployment at massive scale, and cost‑focused monitoring, each with abstracts, outlines and key takeaways for practitioners.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Baidu’s QCon 2021 Highlights: Elastic Scaling, Search Architecture, AI Chips

Baidi Large‑Scale Container Orchestration Elastic Scaling

Abstract: Driven by micro‑services and cloud‑native trends, Baidu explores how to optimize resource efficiency and overall cost for massive container fleets, presenting a data‑driven elastic scaling framework, automated policies, and event‑driven mechanisms that support million‑scale adjustments across products such as Search, Feed, and Baidu App.

Background & Trends

Micro‑services and cloud‑native momentum

Service governance and resource cost challenges

Elastic Mechanism Technology Selection

Industry research: open‑source vs. self‑built trade‑offs

Implementation path and cross‑team collaboration

Baidi Elastic Scaling System Design

Framework design: goals, layers, principles

System components: data collection, automated policies, event‑driven engine

Advanced strategies: traffic scheduling, premium container placement, time‑sharing reuse

Business Elastic Scaling Practices

Multi‑level elasticity for diverse scenarios

Extreme elasticity (Serverless) for high‑speed demands

Impact on machine cost, resource efficiency, service stability, business metrics

Lessons learned from real‑world deployments

Summary & Outlook

Broader business scenario adoption

Future service governance roadmap

From Storage to Compute: Extreme Elasticity in Baidu Search Middleware

Abstract: Baidu Search middleware handles billions of daily queries across diverse scenarios. The existing micro‑service architecture reached limits in efficiency and cost, prompting a system‑wide elasticity approach that decouples data distribution, compute orchestration, and service topology, achieving up to 30% machine‑cost savings and halving human effort.

Search Middleware Overview

Architecture with ~20 micro‑service modules covering content computation to online retrieval

Current automation and scaling capabilities

Challenges of Complex Heterogeneous Workloads

Business delivery flow constraints

Adaptation to evolving demand

Storage Elasticity Mechanisms

Data grouping, allocation, migration strategies

Intelligent data governance

Content Compute Elasticity

Adaptive data freshness guarantees

Smart function orchestration (FaaS)

Compute demands in search scenarios

Super‑Automation Delivery via Elastic Capability

Unified demand‑to‑operation workflow

Hyper‑automated delivery

Future Outlook

Standardized demand understanding

Low‑code platform for search

Large‑Scale Search Model Architecture Optimization

Abstract: Deploying massive deep‑learning models for Baidu Search on heterogeneous accelerators (GPU, Kunlun chips) incurs high operational costs. This talk details the architecture of large‑scale online models and several optimization practices, including lossless architectural refinements and offline compression techniques, aiming to balance performance gains with cost control.

Business and architecture evolution of large‑scale search models

"No‑Flaw" – lossless architectural optimization

"Tian‑Gong" – offline compression optimization

Future directions: architecture‑driven model improvements

Cloud‑AI Chip Massive Deployment at Baidu

Abstract: With AI chips becoming pivotal for inference workloads, Baidu showcases the Kunlun chip’s technical characteristics and large‑scale deployment in data‑center inference scenarios, highlighting end‑to‑end performance tuning, efficient mixed‑workload handling, and practical lessons from production.

AI chip background and industry trends

Kunlun architecture and key features

Large‑scale deployment experiences

Future work and roadmap

Cost‑Optimized Large‑Scale Microservice Monitoring

Abstract: Micro‑services increase system complexity, demanding observability solutions that can scale to billions of requests without prohibitive cost. Baidu shares the design of the Fengjing monitoring platform deployed across advertising and content services, emphasizing low‑cost data collection, cheap compute/storage, and minimal operational overhead, while still delivering comprehensive insights.

Monitoring Demands in High‑Volume Multi‑Business Scenarios

Differences between business‑centric and traditional monitoring

Complexities of inter‑linked subsystems

Cost‑Driven Technical Considerations

Balancing cost versus monitoring capability

Black‑Techniques for Extreme Cost Optimization

Non‑intrusive probe technology

Low‑cost data analysis and topology computation

Weave‑in circuit‑breaker and rate‑limiting methods

Holistic Monitoring Governance Outlook

Integrated monitoring governance vision

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringCloud NativeMicroservicesScalabilitySearch Architecturecontainer orchestrationAI chips
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.