Tagged articles

XPU

7 articles · Page 1 of 1

Feb 6, 2026 · Artificial Intelligence

Accelerating GLM‑4.x Inference on Kunlun XPU with SGLang & vLLM

Baidu’s Baige team successfully adapted the GLM‑4.x series language models to the Kunlun XPU platform by leveraging SGLang and the vLLM‑Kunlun plugin, employing agile adaptation, precision alignment with torch_xray, and extensive performance tuning to achieve GPU‑level accuracy and superior inference speed.

AIXPUhardware acceleration

0 likes · 6 min read

Accelerating GLM‑4.x Inference on Kunlun XPU with SGLang & vLLM

Baidu Intelligent Cloud Tech Hub

Jan 6, 2026 · Operations

How vLLM‑Kunlun Plugin Enabled Two‑Day Adaptation of MiMo Flash V2 on Kunlun P800 XPU

In just two days, Baidu Baige and Kunlun's engineers extended the vLLM‑Kunlun Plugin to overcome asymmetric KV dimensions and integrate SWA+Sink attention, achieving lossless, high‑performance inference of the MiMo Flash V2 model on the Kunlun P800 XPU.

Hybrid AttentionKunlun P800MiMo Flash V2

0 likes · 8 min read

How vLLM‑Kunlun Plugin Enabled Two‑Day Adaptation of MiMo Flash V2 on Kunlun P800 XPU

Baidu Intelligent Cloud Tech Hub

Dec 10, 2025 · Artificial Intelligence

Accelerate LLM Deployment on Baidu Kunlun XPU with the Open‑Source vLLM‑Kunlun Plugin

The vLLM‑Kunlun Plugin, built on the vLLM hardware‑plugin RFC, lets developers deploy any major large language model on Baidu's Kunlun XPU instantly without modifying vLLM core code, dramatically shortening migration time, providing high‑performance fusion operators, and offering open‑source tools for precision verification and profiling.

KunlunLLMXPU

0 likes · 8 min read

Accelerate LLM Deployment on Baidu Kunlun XPU with the Open‑Source vLLM‑Kunlun Plugin

Architects' Tech Alliance

Nov 9, 2025 · Artificial Intelligence

How SUE Ethernet Redefines AI Cluster Interconnects for Scale‑Up Performance

This article examines Broadcom's Scale Up Ethernet (SUE) framework, detailing how it addresses AI/HPC rack‑scale interconnect challenges by delivering ultra‑high bandwidth, microsecond‑level latency, memory‑semantic operations, and seamless compatibility with existing Ethernet infrastructure for large XPU clusters.

AI interconnectHPCHigh Bandwidth

0 likes · 12 min read

How SUE Ethernet Redefines AI Cluster Interconnects for Scale‑Up Performance

Architects' Tech Alliance

Oct 24, 2025 · Artificial Intelligence

How xPU Scale‑Up Networks Are Redefining AI Training Efficiency

As AI models grow to massive scales, the demand for ultra‑high‑performance, low‑latency networking in xPU clusters intensifies, prompting a shift from dense to MoE architectures and driving the evolution of Scale‑up networks, where Alibaba Cloud’s UPN design tackles bandwidth, cost, and reliability challenges.

AIMoENetwork

0 likes · 13 min read

How xPU Scale‑Up Networks Are Redefining AI Training Efficiency

Architects' Tech Alliance

May 31, 2025 · Artificial Intelligence

GPU Cluster Scaling: Understanding Scale‑Up and Scale‑Out for AI Pods

This article explains the concepts of AI Pods and GPU clusters, compares vertical (scale‑up) and horizontal (scale‑out) expansion, describes XPU types, discusses internal and inter‑pod communication, and evaluates the benefits and drawbacks of each scaling approach along with relevant networking technologies.

AI PodsGPUInfiniBand

0 likes · 10 min read

GPU Cluster Scaling: Understanding Scale‑Up and Scale‑Out for AI Pods

Baidu Tech Salon

Jul 4, 2022 · Artificial Intelligence

Kunlun Chip XPU Architecture, Software Stack, and Programming Model Overview

Kunlun Chip’s XPU‑R architecture combines high‑performance SDNN and Cluster compute units, 512 GB/s GDDR6 memory, and PCIe 4.0 interconnect, supported by an LLVM‑based software stack, CUDA‑like programming model, and seamless PaddlePaddle integration, enabling efficient AI training and inference with significant cost and performance gains.

AI chipPaddlePaddleProgramming Model

0 likes · 16 min read

Kunlun Chip XPU Architecture, Software Stack, and Programming Model Overview