Baidu Intelligent Cloud Tech Hub
Author

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

137
Articles
0
Likes
314
Views
0
Comments
Recent Articles

Latest from Baidu Intelligent Cloud Tech Hub

100 recent articles max
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Nov 7, 2025 · Artificial Intelligence

From Big Data to 30,000‑GPU Clusters: The Evolution of China’s AI Infrastructure

In a deep interview, Baidu AI Computing chief scientist Wang Yanpeng and host Koji trace China's internet infrastructure from the early big‑data era through cloud computing to today's AI boom, highlighting the pivotal role of compute power, GPU acceleration, data scaling, and Baidu's Baige platform in shaping the AI arms race.

AI InfrastructureBaidu BaigeGPU computing
0 likes · 26 min read
From Big Data to 30,000‑GPU Clusters: The Evolution of China’s AI Infrastructure
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Nov 4, 2025 · Artificial Intelligence

How Baidu’s Baige Accelerates Multimodal Video Training with Context Parallelism

Baidu Baige’s enhanced veRL framework dramatically boosts video frame rates and resolution limits, cuts training time, reduces memory usage, and improves model accuracy by leveraging context parallelism and optimized attention on Ampere GPUs for multimodal mixed‑training scenarios.

AI accelerationContext ParallelismVideo Processing
0 likes · 6 min read
How Baidu’s Baige Accelerates Multimodal Video Training with Context Parallelism
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Oct 29, 2025 · Operations

How to Prevent Avalanche Failures in Large‑Scale Microservice Systems

This article explains how Baidu's SRE team identified the root causes of avalanche failures in massive microservice architectures, modeled system limits with Little’s Law, and implemented engineering practices such as retry budgets, queue throttling, and global TTL controls to achieve self‑healing and eliminate avalanche incidents.

SREavalanche failuremicroservices
0 likes · 9 min read
How to Prevent Avalanche Failures in Large‑Scale Microservice Systems
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Sep 22, 2025 · Cloud Computing

How Mantle Breaks the Hierarchical Namespace Bottleneck in Cloud Object Storage

The Mantle system, presented in a SOSP'25 paper by Baidu's storage team and collaborators, delivers a distributed hierarchical namespace for cloud object storage that overcomes traditional scalability and performance limits, enabling massive data lake workloads with dramatically reduced latency and vastly increased throughput.

SOSPcloud storagedistributed systems
0 likes · 8 min read
How Mantle Breaks the Hierarchical Namespace Bottleneck in Cloud Object Storage
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Sep 9, 2025 · Artificial Intelligence

How Baidu Built a 32,000‑Card AI Super‑Compute Cluster and Boosted Efficiency by 50%

This article details Baidu Intelligent Cloud's journey in designing, constructing, and operating a 32,000‑card hybrid AI compute cluster, covering challenges in power, cooling, networking, multi‑cluster scheduling, and security, and explains how innovative hardware, software, and operational strategies achieved over 50% MFU improvement and industry‑first performance records.

AI InfrastructureGPU clustersHybrid Cloud
0 likes · 15 min read
How Baidu Built a 32,000‑Card AI Super‑Compute Cluster and Boosted Efficiency by 50%
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Sep 4, 2025 · Artificial Intelligence

Unlocking MoE Model Power: Baidu’s Baige 5.0 AI Platform’s FP8 and Distributed Innovations

Baidu’s Baige 5.0 AI Computing Platform introduces FP8 mixed‑precision training, MoE‑aware distributed strategies, adaptive parallelism, and a three‑tier KV‑Cache, delivering over 30% training speedup and 50% inference throughput gains while keeping token latency under half a second for large‑scale models.

AIFP8MoE
0 likes · 16 min read
Unlocking MoE Model Power: Baidu’s Baige 5.0 AI Platform’s FP8 and Distributed Innovations
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jul 25, 2025 · Operations

How Baidu’s Lingxi Agent Uses LLMs to Automate Network Fault Diagnosis

This article details Baidu's evolution from manual network fault analysis to a multi‑agent AI platform, describing how the Lingxi intelligent agent leverages large language models, MCP tools, and design patterns to automate latency queries, generate analysis reports, and integrate with existing monitoring services.

AI agentsMCP protocolnetwork operations
0 likes · 19 min read
How Baidu’s Lingxi Agent Uses LLMs to Automate Network Fault Diagnosis
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
May 23, 2025 · Artificial Intelligence

How Baidu’s Kunlun Supernode Redefines AI Compute Density and Performance

This article explains how Baidu’s Kunlun supernode, built on high‑density liquid‑cooled cabinets and a modular 1U 4‑card design, breaks traditional 8‑card limits, boosts compute density four‑fold, improves power and cooling efficiency, and provides a scalable foundation for large‑model AI training and inference.

AI InfrastructureGPU clusterHigh Performance Computing
0 likes · 13 min read
How Baidu’s Kunlun Supernode Redefines AI Compute Density and Performance