Tagged articles
4 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 3, 2026 · Artificial Intelligence

How CUDA Agent Lets Anyone Write High‑Performance CUDA Kernels, Challenging Nvidia’s AI Moat

CUDA Agent, a large‑scale reinforcement‑learning system from ByteDance and Tsinghua, can automatically generate and optimize CUDA kernels that outperform torch.compile by up to 2× on simple kernels and achieve around 40% higher speed than proprietary models on the hardest benchmarks, while detailing its data‑synthesis pipeline, training workflow, and current limitations.

CUDAGPU OptimizationKernelBench
0 likes · 10 min read
How CUDA Agent Lets Anyone Write High‑Performance CUDA Kernels, Challenging Nvidia’s AI Moat
Bilibili Tech
Bilibili Tech
Jan 28, 2026 · Artificial Intelligence

Boosting Video Generation Inference: Full Graph Compilation with torch.compile

This article examines the challenges of optimizing video generation model inference, moving from operator-level tweaks to full-graph compilation using torch.compile, and details systematic strategies to eliminate Graph Breaks, handle dynamic shapes, KV-Cache indexing, and Python-side caches, achieving a 47.6% speedup on a 14B model without accuracy loss.

Inference AccelerationVideo Generationai
0 likes · 14 min read
Boosting Video Generation Inference: Full Graph Compilation with torch.compile
AI Algorithm Path
AI Algorithm Path
Mar 16, 2025 · Artificial Intelligence

Speed Up Your PyTorch Model Training: Practical Tips and Tricks

This article walks through concrete techniques to accelerate PyTorch training, covering mixed‑precision with torch.cuda.amp, profiling with torch.profiler, DataLoader tuning, torch.compile, distributed strategies like DataParallel and DDP, gradient accumulation, and advanced libraries such as Lightning, Apex, and DeepSpeed, plus model‑level optimizations and monitoring tips.

DataLoaderDistributed TrainingProfiling
0 likes · 12 min read
Speed Up Your PyTorch Model Training: Practical Tips and Tricks
Python Programming Learning Circle
Python Programming Learning Circle
Mar 22, 2023 · Artificial Intelligence

Overview of PyTorch 2.0 Features and New APIs

The article provides a detailed overview of PyTorch 2.0, highlighting its stable and beta features such as torch.compile, accelerated transformers, MPS backend, new quantization support, and prototype parallelism tools, while emphasizing performance improvements for dynamic shapes, distributed training, and CPU/GPU inference.

Accelerated TransformersDeep LearningMPS
0 likes · 6 min read
Overview of PyTorch 2.0 Features and New APIs