Author

Baidu Intelligent Cloud Tech Hub

We share the cloud tech topics you care about. Feel free to leave a message and tell us what you'd like to learn.

130

Articles

Likes

Views

Comments

Latest from Baidu Intelligent Cloud Tech Hub

100 recent articles max

Baidu Intelligent Cloud Tech Hub

Apr 24, 2026 · Artificial Intelligence

LoongForge: Open‑Source Multimodal Training Framework Runs on GPU and Kunlun XPU with 45% Speedup

LoongForge is an open‑source, Megatron‑based multimodal training framework that unifies LLM, VLM, VLA and diffusion models, runs seamlessly on NVIDIA GPUs and Baidu Kunlun XPU, and delivers 15%‑45% end‑to‑end training acceleration while scaling linearly on thousands of cards.

GPUKunlun XPULoongForge

0 likes · 23 min read

LoongForge: Open‑Source Multimodal Training Framework Runs on GPU and Kunlun XPU with 45% Speedup

Baidu Intelligent Cloud Tech Hub

Apr 8, 2026 · Artificial Intelligence

Unlocking 8‑Hour Autonomous Coding: GLM‑5.1’s Leap with Kunlun XPU

The open‑source GLM‑5.1 model, adapted to Baidu Baige's Kunlun XPU via the vLLM‑Kunlun Plugin, delivers record‑breaking SWE‑bench scores, eight‑hour autonomous coding, long‑context handling up to 64K tokens, and scalable deployment across tens of thousands of chips, showcasing end‑to‑end AI acceleration.

GLM-5.1Kunlun XPUQuantization

0 likes · 8 min read

Unlocking 8‑Hour Autonomous Coding: GLM‑5.1’s Leap with Kunlun XPU

Baidu Intelligent Cloud Tech Hub

Apr 7, 2026 · Artificial Intelligence

How Baidu’s 7th‑Gen AI Confidential VM Achieves Full‑Stack Secure Compute

Baidu Intelligent Cloud’s seventh‑generation AI confidential virtual machine combines Intel TDX, NVIDIA GPUs, and BlueField DPUs to deliver end‑to‑end encrypted data paths, elastic multi‑GPU scaling, and near‑native performance, proving that high‑sensitivity AI workloads can run securely in the cloud without sacrificing speed.

AIConfidential Computingcloud

0 likes · 17 min read

How Baidu’s 7th‑Gen AI Confidential VM Achieves Full‑Stack Secure Compute

Baidu Intelligent Cloud Tech Hub

Mar 23, 2026 · Artificial Intelligence

How vLLM‑Kunlun Unlocks Peak LLM Performance on Kunlun XPU

This article details the technical challenges of adapting the open‑source vLLM inference framework to Baidu's Kunlun XPU, outlines four major performance bottlenecks, and presents a multi‑dimensional optimization roadmap—including custom plugins, operator fusion, INT8 quantization, and CUDA‑Graph techniques—that together boost throughput by up to 8% and narrow the gap with leading GPU hardware.

CUDA GraphINT8 quantizationKunlun XPU

0 likes · 13 min read

How vLLM‑Kunlun Unlocks Peak LLM Performance on Kunlun XPU

Baidu Intelligent Cloud Tech Hub

Mar 18, 2026 · Artificial Intelligence

How vLLM‑Kunlun Brings CUDA‑Like Inference to Kunlun XPU: Architecture, Adaptation, and Performance Wins

This article details the vLLM‑Kunlun open‑source project that adapts the high‑performance vLLM inference engine to Baidu's Kunlun XPU, covering platform overview, model‑porting workflow, plugin architecture, concrete case studies with MIMO‑Flash‑V2 and Qwen 3.5, and the performance‑tuning techniques that enable seamless, GPU‑level inference on domestic hardware.

AIInferenceKunlun

0 likes · 12 min read

How vLLM‑Kunlun Brings CUDA‑Like Inference to Kunlun XPU: Architecture, Adaptation, and Performance Wins

Baidu Intelligent Cloud Tech Hub

Mar 6, 2026 · Artificial Intelligence

How Baidu’s End‑to‑End Quantization Stack Supercharges Large‑Model Inference on Kunlun XPU

Baidu Baige built a full‑stack quantization pipeline that integrates model‑level, framework‑level, and hardware‑level optimizations on the Kunlun XPU platform, enabling FP16/BF16 large models to be compressed to 25‑50% of their original size while boosting inference speed by 30‑50% and dramatically reducing memory consumption for enterprise deployments.

AI inferenceHardware AccelerationINT4

0 likes · 16 min read

How Baidu’s End‑to‑End Quantization Stack Supercharges Large‑Model Inference on Kunlun XPU

Baidu Intelligent Cloud Tech Hub

Feb 12, 2026 · Artificial Intelligence

Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin

This article explains how Baidu's new GLM-5 large model is adapted to the Kunlun P800 XPU, detailing the async reinforcement learning framework Slime, optimization techniques like INT8 quantization and tensor‑parallelism, and provides step‑by‑step deployment commands using the open‑source vLLM‑Kunlun plugin.

AI accelerationGLM-5INT8 quantization

0 likes · 6 min read

Deploying GLM-5 on Baidu Kunlun P800 XPU with vLLM‑Kunlun Plugin

Baidu Intelligent Cloud Tech Hub

Feb 6, 2026 · Artificial Intelligence

Accelerating GLM‑4.x Inference on Kunlun XPU with SGLang & vLLM

Baidu’s Baige team successfully adapted the GLM‑4.x series language models to the Kunlun XPU platform by leveraging SGLang and the vLLM‑Kunlun plugin, employing agile adaptation, precision alignment with torch_xray, and extensive performance tuning to achieve GPU‑level accuracy and superior inference speed.

AIHardware AccelerationModel Inference

0 likes · 6 min read

Accelerating GLM‑4.x Inference on Kunlun XPU with SGLang & vLLM

Baidu Intelligent Cloud Tech Hub

Jan 27, 2026 · Artificial Intelligence

Deploying Qwen3 on Kunlun P800: Full‑Parameter DPO Training and Inference Guide

This guide walks through setting up a Kunlun P800 XPU host, preparing Docker containers, deploying Qwen3‑8B/‑32B/‑VL models with vLLM‑Kunlun, benchmarking performance, and running full‑parameter DPO training using LLaMA‑Factory, providing scripts, configuration files, and troubleshooting tips for AI engineers.

DPOInferenceKunlun P800

0 likes · 32 min read

Deploying Qwen3 on Kunlun P800: Full‑Parameter DPO Training and Inference Guide

Baidu Intelligent Cloud Tech Hub

Jan 20, 2026 · Artificial Intelligence

How LoongFlow Enables Expert‑Level AI Agents to Outperform Human Mathematicians

LoongFlow is an open‑source AI agent framework that combines a Plan‑Execute‑Summarize (PES) paradigm with a Hybrid Evolutionary Memory system, allowing agents to perform directed, iterative problem solving and achieve state‑of‑the‑art results on mathematical challenges, Kaggle‑style benchmarks, and real‑world tasks with dramatically higher efficiency.

Evolutionary AlgorithmsLoongFlowbenchmarking

0 likes · 15 min read

How LoongFlow Enables Expert‑Level AI Agents to Outperform Human Mathematicians