Author

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

130

Articles

Likes

Views

Comments

Latest from Baidu Intelligent Cloud Tech Hub

100 recent articles max

Baidu Intelligent Cloud Tech Hub

May 31, 2024 · Artificial Intelligence

How Multi‑Chip Heterogeneous Clusters Power Next‑Gen Large Model Training

Using a martial‑arts analogy, the article explains why training massive AI models now requires thousands of GPUs or mixed‑chip clusters, outlines three key steps—inter‑connect, distributed parallel strategies, and accelerator acceleration—and shows how Baidu’s Baige platform achieves near‑full efficiency across GPU, Kunlun and Ascend chips.

AI trainingGPU interconnectaccelerator optimization

0 likes · 11 min read

How Multi‑Chip Heterogeneous Clusters Power Next‑Gen Large Model Training

Baidu Intelligent Cloud Tech Hub

May 27, 2024 · Databases

Baidu’s Enterprise Vector Database: Architecture, Performance, and RAG Secrets

An exclusive interview with Baidu’s senior database architects reveals the motivations behind building a dedicated enterprise vector database, details its novel column‑store engine, C++‑based retrieval stack, performance gains over open‑source solutions, multi‑modal support, RAG integration, and future research directions.

AIRAGStorage Engine

0 likes · 28 min read

Baidu’s Enterprise Vector Database: Architecture, Performance, and RAG Secrets

Baidu Intelligent Cloud Tech Hub

May 15, 2024 · Artificial Intelligence

How Baidu’s AIAK‑LLM Supercharges Large‑Model Training and Inference

The article explains the scaling challenges of ever‑larger LLMs, introduces the MFU performance metric, surveys industry parallelism and memory‑saving techniques, and details Baidu’s AIAK‑LLM suite—including resource, component and acceleration layers—as well as concrete training and inference optimizations that raise MFU by 30‑60% and cut deployment costs.

AI infrastructureMFUTraining Acceleration

0 likes · 25 min read

How Baidu’s AIAK‑LLM Supercharges Large‑Model Training and Inference

Baidu Intelligent Cloud Tech Hub

May 8, 2024 · Artificial Intelligence

How AI Powers the Next‑Gen Sugar BI Platform for Smarter Decision‑Making

This article details the evolution of Baidu's Sugar BI platform, highlighting its AI‑driven analytics, extensive data source support, zero‑code visual design, smart chart recommendation, and the conversational Sugar Bot that transforms natural‑language queries into actionable visual insights.

AIAnalyticsBI

0 likes · 18 min read

How AI Powers the Next‑Gen Sugar BI Platform for Smarter Decision‑Making

Baidu Intelligent Cloud Tech Hub

Apr 24, 2024 · Artificial Intelligence

How to Build and Accelerate Multi‑Chip AI Clusters for Large‑Model Training

With AI training demands outgrowing single‑chip GPU clusters, this article explains how to construct and speed up heterogeneous AI clusters—combining GPUs, Kunlun, and Ascend chips—by addressing interconnect, distributed parallel strategies, and specialized acceleration suites to achieve high MFU and efficient large‑model training.

AI clusteringDistributed TrainingGPU acceleration

0 likes · 15 min read

How to Build and Accelerate Multi‑Chip AI Clusters for Large‑Model Training

Baidu Intelligent Cloud Tech Hub

Apr 16, 2024 · Operations

Tackling Multi-CPU Performance Challenges with Baidu’s One-Click Btune

At QCon 2024, Baidu Intelligent Cloud presented the complexities of optimizing diverse CPU architectures in data centers and introduced Btune, a one‑click solution that automates bottleneck detection, analysis, and performance tuning across Intel, AMD, and ARM platforms, enabling engineers to boost service efficiency.

BtuneCPU performanceMulti-Architecture

0 likes · 18 min read

Tackling Multi-CPU Performance Challenges with Baidu’s One-Click Btune

Baidu Intelligent Cloud Tech Hub

Mar 1, 2024 · Artificial Intelligence

How Baidu’s BCCL Boosts Distributed AI Training with Real‑Time Observability and Fault Diagnosis

Baidu’s Collective Communication Library (BCCL) enhances large‑model distributed training by improving real‑time bandwidth monitoring, fault diagnosis, network stability, and performance, leveraging RDMA networks and GPU‑specific optimizations to increase effective training time to 98% and bandwidth utilization to 95%.

AI infrastructureDistributed TrainingFault Diagnosis

0 likes · 11 min read

How Baidu’s BCCL Boosts Distributed AI Training with Real‑Time Observability and Fault Diagnosis

Baidu Intelligent Cloud Tech Hub

Jan 31, 2024 · Artificial Intelligence

How Baidu Built an 80% Accurate AI-Powered Database Ops Knowledge Base

This article details Baidu Intelligent Cloud's database operations team’s end‑to‑end design of an AI‑driven knowledge‑base Q&A system, covering background, architecture, technical choices, module implementation, key challenges such as vector‑search recall and token limits, and real‑world deployment scenarios.

AIprompt engineeringvector database

0 likes · 18 min read

How Baidu Built an 80% Accurate AI-Powered Database Ops Knowledge Base

Baidu Intelligent Cloud Tech Hub

Jan 24, 2024 · Operations

Boost Cloud App Performance by 36% with Baidu’s Btune Diagnostic Tool

This article explains how Baidu Cloud’s Btune performance‑diagnostic tool helps identify CPU, memory and NUMA bottlenecks, provides automatic optimization suggestions, and demonstrates a real‑world test that improves a memory‑intensive program’s runtime by up to 36.8% after applying the recommended changes.

BtuneNUMAcloud computing

0 likes · 10 min read

Boost Cloud App Performance by 36% with Baidu’s Btune Diagnostic Tool

Baidu Intelligent Cloud Tech Hub

Jan 16, 2024 · Databases

How Baidu’s BTS Powers High‑Performance, Multi‑Model NoSQL at Scale

This article details Baidu Cloud's BTS semi‑structured storage system, covering its three‑generation evolution, three‑layer architecture, performance optimizations, high‑availability mechanisms, and real‑world use cases such as autonomous driving and large‑scale system monitoring.

BTSNoSQLdatabase

0 likes · 14 min read

How Baidu’s BTS Powers High‑Performance, Multi‑Model NoSQL at Scale