How SGLang Encoded Engineering Experience into Agents and Achieved Up to 2.75× Kernel Speedups
The SGLang team turned their benchmarking, profiling, CUDA kernel tuning, and production‑issue triage know‑how into reusable agent skills, merging three KDA‑Pilot PRs that delivered up to 2.75× kernel acceleration, a 71.4% throughput boost for Qwen3‑Next and a TTFT reduction from 456 ms to 168 ms, while outlining a repeatable workflow and practical rules for large‑scale performance engineering.
