Old Zhang's AI Learning
Apr 23, 2026 · Artificial Intelligence
DeepSeek Quietly Open‑Sources TileKernels to Push GPU Performance to Its Limits
DeepSeek has released TileKernels, a GPU kernel library written in the TileLang DSL, that targets H100/H200/B200 GPUs and claims to approach hardware limits in compute intensity and memory bandwidth, offering MoE routing, FP8/FP4 quantization, and dual‑language PyTorch references for deep‑learning engineers.
FP8 quantizationGPU OptimizationLLM training
0 likes · 9 min read
