Baidu Intelligent Cloud Tech Hub
Author

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

130
Articles
0
Likes
99
Views
0
Comments
Recent Articles

Latest from Baidu Intelligent Cloud Tech Hub

100 recent articles max
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
May 31, 2024 · Artificial Intelligence

How Multi‑Chip Heterogeneous Clusters Power Next‑Gen Large Model Training

Using a martial‑arts analogy, the article explains why training massive AI models now requires thousands of GPUs or mixed‑chip clusters, outlines three key steps—inter‑connect, distributed parallel strategies, and accelerator acceleration—and shows how Baidu’s Baige platform achieves near‑full efficiency across GPU, Kunlun and Ascend chips.

AI trainingGPU interconnectaccelerator optimization
0 likes · 11 min read
How Multi‑Chip Heterogeneous Clusters Power Next‑Gen Large Model Training
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
May 27, 2024 · Databases

Baidu’s Enterprise Vector Database: Architecture, Performance, and RAG Secrets

An exclusive interview with Baidu’s senior database architects reveals the motivations behind building a dedicated enterprise vector database, details its novel column‑store engine, C++‑based retrieval stack, performance gains over open‑source solutions, multi‑modal support, RAG integration, and future research directions.

AIRAGStorage Engine
0 likes · 28 min read
Baidu’s Enterprise Vector Database: Architecture, Performance, and RAG Secrets
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
May 15, 2024 · Artificial Intelligence

How Baidu’s AIAK‑LLM Supercharges Large‑Model Training and Inference

The article explains the scaling challenges of ever‑larger LLMs, introduces the MFU performance metric, surveys industry parallelism and memory‑saving techniques, and details Baidu’s AIAK‑LLM suite—including resource, component and acceleration layers—as well as concrete training and inference optimizations that raise MFU by 30‑60% and cut deployment costs.

AI infrastructureMFUTraining Acceleration
0 likes · 25 min read
How Baidu’s AIAK‑LLM Supercharges Large‑Model Training and Inference
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Apr 24, 2024 · Artificial Intelligence

How to Build and Accelerate Multi‑Chip AI Clusters for Large‑Model Training

With AI training demands outgrowing single‑chip GPU clusters, this article explains how to construct and speed up heterogeneous AI clusters—combining GPUs, Kunlun, and Ascend chips—by addressing interconnect, distributed parallel strategies, and specialized acceleration suites to achieve high MFU and efficient large‑model training.

AI clusteringDistributed TrainingGPU acceleration
0 likes · 15 min read
How to Build and Accelerate Multi‑Chip AI Clusters for Large‑Model Training
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Apr 16, 2024 · Operations

Tackling Multi-CPU Performance Challenges with Baidu’s One-Click Btune

At QCon 2024, Baidu Intelligent Cloud presented the complexities of optimizing diverse CPU architectures in data centers and introduced Btune, a one‑click solution that automates bottleneck detection, analysis, and performance tuning across Intel, AMD, and ARM platforms, enabling engineers to boost service efficiency.

BtuneCPU performanceMulti-Architecture
0 likes · 18 min read
Tackling Multi-CPU Performance Challenges with Baidu’s One-Click Btune
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Mar 1, 2024 · Artificial Intelligence

How Baidu’s BCCL Boosts Distributed AI Training with Real‑Time Observability and Fault Diagnosis

Baidu’s Collective Communication Library (BCCL) enhances large‑model distributed training by improving real‑time bandwidth monitoring, fault diagnosis, network stability, and performance, leveraging RDMA networks and GPU‑specific optimizations to increase effective training time to 98% and bandwidth utilization to 95%.

AI infrastructureDistributed TrainingFault Diagnosis
0 likes · 11 min read
How Baidu’s BCCL Boosts Distributed AI Training with Real‑Time Observability and Fault Diagnosis
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jan 31, 2024 · Artificial Intelligence

How Baidu Built an 80% Accurate AI-Powered Database Ops Knowledge Base

This article details Baidu Intelligent Cloud's database operations team’s end‑to‑end design of an AI‑driven knowledge‑base Q&A system, covering background, architecture, technical choices, module implementation, key challenges such as vector‑search recall and token limits, and real‑world deployment scenarios.

AIprompt engineeringvector database
0 likes · 18 min read
How Baidu Built an 80% Accurate AI-Powered Database Ops Knowledge Base