Tagged articles
2 articles
Page 1 of 1
Machine Heart
Machine Heart
Jun 15, 2026 · Artificial Intelligence

Domestic GPU Trains AI to Write Its Own Kernels—Moore Threads Tops KernelBench

MusaCoder‑27B‑RL, the first open‑source large model fully trained on a domestic GPU stack, achieved an 88.6% pass rate on the Stanford‑Princeton KernelBench benchmark and outperformed leading foreign models by delivering at least 1.1× speedup over baseline kernels.

AI code generationChinese GPUGPU kernel generation
0 likes · 11 min read
Domestic GPU Trains AI to Write Its Own Kernels—Moore Threads Tops KernelBench
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 3, 2026 · Artificial Intelligence

How CUDA Agent Lets Anyone Write High‑Performance CUDA Kernels, Challenging Nvidia’s AI Moat

CUDA Agent, a large‑scale reinforcement‑learning system from ByteDance and Tsinghua, can automatically generate and optimize CUDA kernels that outperform torch.compile by up to 2× on simple kernels and achieve around 40% higher speed than proprietary models on the hardest benchmarks, while detailing its data‑synthesis pipeline, training workflow, and current limitations.

CUDAGPU optimizationKernelBench
0 likes · 10 min read
How CUDA Agent Lets Anyone Write High‑Performance CUDA Kernels, Challenging Nvidia’s AI Moat