Tagged articles
7 articles
Page 1 of 1
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Feb 6, 2026 · Artificial Intelligence

Accelerating GLM‑4.x Inference on Kunlun XPU with SGLang & vLLM

Baidu’s Baige team successfully adapted the GLM‑4.x series language models to the Kunlun XPU platform by leveraging SGLang and the vLLM‑Kunlun plugin, employing agile adaptation, precision alignment with torch_xray, and extensive performance tuning to achieve GPU‑level accuracy and superior inference speed.

AIHardware accelerationXPU
0 likes · 6 min read
Accelerating GLM‑4.x Inference on Kunlun XPU with SGLang & vLLM
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Dec 10, 2025 · Artificial Intelligence

Accelerate LLM Deployment on Baidu Kunlun XPU with the Open‑Source vLLM‑Kunlun Plugin

The vLLM‑Kunlun Plugin, built on the vLLM hardware‑plugin RFC, lets developers deploy any major large language model on Baidu's Kunlun XPU instantly without modifying vLLM core code, dramatically shortening migration time, providing high‑performance fusion operators, and offering open‑source tools for precision verification and profiling.

InferenceKunlunLLM
0 likes · 8 min read
Accelerate LLM Deployment on Baidu Kunlun XPU with the Open‑Source vLLM‑Kunlun Plugin
Architects' Tech Alliance
Architects' Tech Alliance
Nov 9, 2025 · Artificial Intelligence

How SUE Ethernet Redefines AI Cluster Interconnects for Scale‑Up Performance

This article examines Broadcom's Scale Up Ethernet (SUE) framework, detailing how it addresses AI/HPC rack‑scale interconnect challenges by delivering ultra‑high bandwidth, microsecond‑level latency, memory‑semantic operations, and seamless compatibility with existing Ethernet infrastructure for large XPU clusters.

AI interconnectHPCHigh Bandwidth
0 likes · 12 min read
How SUE Ethernet Redefines AI Cluster Interconnects for Scale‑Up Performance
Architects' Tech Alliance
Architects' Tech Alliance
Oct 24, 2025 · Artificial Intelligence

How xPU Scale‑Up Networks Are Redefining AI Training Efficiency

As AI models grow to massive scales, the demand for ultra‑high‑performance, low‑latency networking in xPU clusters intensifies, prompting a shift from dense to MoE architectures and driving the evolution of Scale‑up networks, where Alibaba Cloud’s UPN design tackles bandwidth, cost, and reliability challenges.

AIMoEScale‑Up
0 likes · 13 min read
How xPU Scale‑Up Networks Are Redefining AI Training Efficiency
Architects' Tech Alliance
Architects' Tech Alliance
May 31, 2025 · Artificial Intelligence

GPU Cluster Scaling: Understanding Scale‑Up and Scale‑Out for AI Pods

This article explains the concepts of AI Pods and GPU clusters, compares vertical (scale‑up) and horizontal (scale‑out) expansion, describes XPU types, discusses internal and inter‑pod communication, and evaluates the benefits and drawbacks of each scaling approach along with relevant networking technologies.

AI PodsGPUInfiniBand
0 likes · 10 min read
GPU Cluster Scaling: Understanding Scale‑Up and Scale‑Out for AI Pods
Baidu Tech Salon
Baidu Tech Salon
Jul 4, 2022 · Artificial Intelligence

Kunlun Chip XPU Architecture, Software Stack, and Programming Model Overview

Kunlun Chip’s XPU‑R architecture combines high‑performance SDNN and Cluster compute units, 512 GB/s GDDR6 memory, and PCIe 4.0 interconnect, supported by an LLVM‑based software stack, CUDA‑like programming model, and seamless PaddlePaddle integration, enabling efficient AI training and inference with significant cost and performance gains.

AI ChipPaddlePaddleProgramming Model
0 likes · 16 min read
Kunlun Chip XPU Architecture, Software Stack, and Programming Model Overview