Self-Forcing: Turning Global Video Diffusion into Causal Streaming for Long-Form Generation

This article examines the Wan2.1 video diffusion model, identifies its scalability bottlenecks for long and real‑time video generation, and introduces the Self‑Forcing causal framework together with sequence‑parallel and RoPE optimizations that achieve sub‑second latency and up to 1.5× speed‑up on modern GPUs.

GPU optimizationcausal inferencelarge video generation

0 likes · 14 min read

Self-Forcing: Turning Global Video Diffusion into Causal Streaming for Long-Form Generation

Bilibili Tech

Jan 28, 2026 · Artificial Intelligence

Boosting Video Generation Inference: Full Graph Compilation with torch.compile

This article examines the challenges of optimizing video generation model inference, moving from operator-level tweaks to full-graph compilation using torch.compile, and details systematic strategies to eliminate Graph Breaks, handle dynamic shapes, KV-Cache indexing, and Python-side caches, achieving a 47.6% speedup on a 14B model without accuracy loss.

AIgraph optimizationinference acceleration

0 likes · 14 min read

Boosting Video Generation Inference: Full Graph Compilation with torch.compile