Tagged articles
2 articles
Page 1 of 1
Baidu Intelligent Cloud Tech Hub
Baidu Intelligent Cloud Tech Hub
Jan 12, 2026 · Artificial Intelligence

How to Reduce Large‑Model Inference Cold‑Start to Seconds with vLLM Optimizations

This article details how Baidu Cloud's hybrid‑cloud team leveraged the vLLM framework to cut the cold‑start time of massive models like Qwen3‑235B‑A22B from minutes to a few seconds through accelerated weight loading, CUDA‑graph capture postponement, cross‑instance state reuse, fork‑based process startup, and guard‑instance pre‑warming techniques.

CUDA Graphcold-start optimizationlarge-model inference
0 likes · 16 min read
How to Reduce Large‑Model Inference Cold‑Start to Seconds with vLLM Optimizations
Tencent Cloud Developer
Tencent Cloud Developer
Jul 29, 2021 · Cloud Native

Serverless Global Perspectives: Industry Leaders Discuss Current Status and Future Trends

At ServerlessDays China 2021, leaders from AWS, Alibaba Cloud, ByteDance and Tencent Cloud examined serverless concepts, showcased cost‑saving implementations, discussed cold‑start optimizations, identified adoption hurdles such as tooling and lock‑in, and forecasted a future where serverless becomes a ubiquitous, hyper‑unified compute model integrating cloud, edge and storage.

Cloud Computing TrendsCloud NativeFaaS
0 likes · 35 min read
Serverless Global Perspectives: Industry Leaders Discuss Current Status and Future Trends