Artificial Intelligence 5 min read

How Baidu Cloud Slashes Inference Costs: DeepSeek Model Optimizations Unveiled

Baidu Cloud's Qianfan platform launched DeepSeek‑R1 and DeepSeek‑V3 with ultra‑low inference pricing, leveraging advanced engine performance tweaks, a split Prefill/Decode architecture, and comprehensive security measures that together boost throughput, cut costs, and ensure enterprise‑grade reliability.

Baidu Geek Talk

Feb 10, 2025

How Baidu Cloud Slashes Inference Costs: DeepSeek Model Optimizations Unveiled

Background and Launch

On February 3, Baidu Intelligent Cloud’s Qianfan large‑model platform introduced DeepSeek‑R1 and DeepSeek‑V3, attracting over 15,000 customers on day one. The service offers inference prices as low as 30‑50% of the official DeepSeek rates, with a limited‑time free tier.

Inference Engine Performance Optimization

Building on Baidu’s extensive experience in large‑model inference, the team optimized the MLA structure of DeepSeek models to achieve extreme performance gains. By overlapping compute, communication, and memory operators and employing a high‑efficiency Prefill/Decode split architecture, the system meets SLA targets for TTFT and TPOT while dramatically increasing throughput and reducing inference cost.

Engineering Architecture Innovations

The platform adopts a push‑pull model for request handling, which outperforms traditional pull‑only designs in success rate, latency, and throughput. A novel request‑failure continuation mechanism improves fault tolerance and SLA compliance. KV‑Cache reuse and a global‑cache‑aware traffic scheduling strategy eliminate redundant token calculations, further lowering latency and boosting throughput.

Stability and Security Guarantees

Leveraging Baidu’s proprietary content‑security operators, Qianfan provides enterprise‑grade high‑availability and data‑life‑cycle protection. Specialized security optimizations ensure that DeepSeek‑R1 and DeepSeek‑V3 remain safe for enterprise usage, with end‑to‑end safeguards across the model’s lifecycle.

Platform Capabilities

Qianfan ModelBuilder offers an end‑to‑end AI service suite, including data preprocessing, model fine‑tuning, evaluation, and quantization. It supports major inference frameworks such as vLLM, LMDeploy, TensorRT‑LLM, and SGLang, and allows custom model import and deployment for flexible development.

Future Outlook

Recently, Baidu illuminated its Kunlun‑P800 10‑k‑card cluster—the first domestically built 10‑k‑card AI cluster—while planning a 30‑k‑card expansion. Ongoing technical documentation releases aim to share best practices and accelerate innovation for developers and enterprises alike.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance optimization large language models Security AI inference Model Serving Baidu Cloud

Written by

Baidu Geek Talk

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.