Author

Baidu Geek Talk

520

Articles

Likes

1.6k

Views

Comments

Latest from Baidu Geek Talk

100 recent articles max

Baidu Geek Talk

Jan 7, 2026 · Artificial Intelligence

How Baidu’s vLLM‑Kunlun Plugin Powered MiMo Flash V2 on Kunlun XPU in 2 Days

Within two days, Baidu’s Baige and Kunlun Chip teams adapted the 309‑billion‑parameter MiMo Flash V2 model—featuring a hybrid SWA+Sink and Full Attention mechanism—to run efficiently on the Kunlun P800 XPU using the vLLM‑Kunlun Plugin, achieving lossless performance comparable to GPU inference.

AI inferenceKunlun XPUMiMo Flash V2

0 likes · 7 min read

How Baidu’s vLLM‑Kunlun Plugin Powered MiMo Flash V2 on Kunlun XPU in 2 Days

Baidu Geek Talk

Dec 24, 2025 · Artificial Intelligence

Context Parallelism Slashes TTFT by 80% for 128K-Token LLMs

The article explains how Baidu’s Baige team integrated a Context Parallelism strategy into DeepSeek V3.2, detailing the DSA architecture, the limitations of traditional tensor and sequence parallelism, and how CP distributes computation and memory across GPUs to achieve up to an 80 % reduction in token‑to‑first‑token latency for ultra‑long 128K‑token contexts.

Context ParallelismDeepSeekLLM

0 likes · 9 min read

Context Parallelism Slashes TTFT by 80% for 128K-Token LLMs

Baidu Geek Talk

Dec 17, 2025 · Artificial Intelligence

Accelerate LLM Deployment on Baidu Kunlun XPU with the Open‑Source vLLM‑Kunlun Plugin

The vLLM‑Kunlun Plugin, jointly released by Baidu Baige and Kunlun Chip, provides a high‑performance, zero‑intrusion solution for deploying open‑source large language models on domestic Kunlun XPU hardware, includes fused operators, precision‑validation and profiling tools, and supports over twenty mainstream and multimodal models.

Kunlun XPUModel DeploymentPerformance Optimization

0 likes · 7 min read

Accelerate LLM Deployment on Baidu Kunlun XPU with the Open‑Source vLLM‑Kunlun Plugin

Baidu Geek Talk

Dec 10, 2025 · Artificial Intelligence

How Offloading Latent Cache Boosts DeepSeek‑V3.2‑Exp Decoding Throughput

This report analyzes the memory bottleneck of DeepSeek‑V3.2‑Exp’s sparse‑attention decoder, proposes the Expanded Sparse Server (ESS) to offload the latent cache to CPU memory, and demonstrates through high‑fidelity simulation that the approach dramatically improves decode throughput while keeping latency within acceptable limits.

Cache offloadGPU memoryLLM inference

0 likes · 20 min read

How Offloading Latent Cache Boosts DeepSeek‑V3.2‑Exp Decoding Throughput

Baidu Geek Talk

Nov 10, 2025 · Cloud Native

How Polar‑TCP Breaks Kernel Network Bottlenecks for Cloud‑Native High‑Performance Services

This article explains how traditional kernel network stacks struggle with high‑concurrency, low‑latency cloud data‑center workloads and introduces Baidu Intelligent Cloud’s Polar solution—Polar‑TCP and Polar‑RDMA—which combine user‑space DPDK drivers, a lightweight TCP stack, and an industrial RPC framework to achieve near‑RDMA performance while preserving compatibility with existing TCP ecosystems.

DPDKNetwork StackPerformance Optimization

0 likes · 23 min read

How Polar‑TCP Breaks Kernel Network Bottlenecks for Cloud‑Native High‑Performance Services

Baidu Geek Talk

Nov 5, 2025 · Artificial Intelligence

How AI Agents Are Revolutionizing E‑Commerce and Global Brand Expansion

In a round‑table hosted by Baidu Intelligent Cloud, industry leaders dissect how AI agents are transforming Chinese retail and overseas brand expansion, addressing challenges such as rising traffic costs, low repurchase rates, localization hurdles, and demonstrating concrete use cases in content generation, intelligent customer service, and automated marketing that promise to make AI agents an essential, not optional, component of modern commerce.

AIDigital Transformationcustomer service

0 likes · 17 min read

Baidu Geek Talk

Oct 29, 2025 · Artificial Intelligence

How Baidu Transformed E‑commerce Risk Control with Multi‑Modal AI Agents

This article details Baidu's e‑commerce risk‑control overhaul, explaining how traditional rule‑based and manual reviews struggled with multimodal violations, ambiguous semantics, and poor merchant experience, and how a new AI‑driven pipeline combining large multimodal models, rule engines, and knowledge‑base queries achieved full‑automation, real‑time feedback, and high explainability.

AIe-commercerisk control

0 likes · 13 min read

How Baidu Transformed E‑commerce Risk Control with Multi‑Modal AI Agents

Baidu Geek Talk

Oct 15, 2025 · Artificial Intelligence

Can LLMs Automate Data Ingestion and Cut Integration Time from Months to Days?

This article presents an LLM‑driven intelligent data platform ingestion solution that automates schema recognition, mapping, quality rule extraction, and package building, reducing integration cycles from three months to three days while eliminating manual effort and enhancing scalability and control.

AIAutomationCode Generation

0 likes · 21 min read

Can LLMs Automate Data Ingestion and Cut Integration Time from Months to Days?

Baidu Geek Talk

Oct 13, 2025 · Big Data

How Baidu Scaled Its Data Warehouse to Handle Billions of PVs and Petabytes

This article details Baidu APP's massive data‑warehouse overhaul, describing the two‑step strategy that stabilized log cleaning, modernized the ETL framework, introduced wide‑table architectures, and implemented tiered storage to dramatically improve processing speed, reliability, and cost efficiency for petabyte‑scale workloads.

Big DataETLPerformance Optimization

0 likes · 25 min read

How Baidu Scaled Its Data Warehouse to Handle Billions of PVs and Petabytes

Baidu Geek Talk

Sep 24, 2025 · Big Data

How Feed Real‑Time Data Warehouse Was Re‑Engineered for Speed and Cost Savings

This article explains how Baidu’s Feed real‑time data warehouse was rebuilt using a pure streaming architecture, detailing the limitations of the previous stream‑batch design, the technical solutions—including core/non‑core data separation, metric calculation in streaming, and Parquet storage with Apache Arrow—and the resulting cost reductions, latency improvements, and future roadmap.

Apache ArrowParquetStream Processing

0 likes · 17 min read