Tagged articles
5 articles
Page 1 of 1
Machine Learning Algorithms & Natural Language Processing
Machine Learning Algorithms & Natural Language Processing
Mar 22, 2026 · Artificial Intelligence

NS-Diff: Adding a Physics Engine to Diffusion Models for Fluid and Rigid‑Body Dynamics

The CVPR 2026 paper introduces NS‑Diff, a physics‑guided video diffusion framework that combines a noise‑robust dynamics detector, a physical‑condition latent injection module, and reinforcement‑learning optimization to reduce jerk error by 43 % and fluid divergence by 33 %, achieving superior physical realism and visual quality across multiple benchmarks.

CVPR 2026NS‑DiffNavier-Stokes
0 likes · 13 min read
NS-Diff: Adding a Physics Engine to Diffusion Models for Fluid and Rigid‑Body Dynamics
Bilibili Tech
Bilibili Tech
Feb 13, 2026 · Artificial Intelligence

Self-Forcing: Turning Global Video Diffusion into Causal Streaming for Long-Form Generation

This article examines the Wan2.1 video diffusion model, identifies its scalability bottlenecks for long and real‑time video generation, and introduces the Self‑Forcing causal framework together with sequence‑parallel and RoPE optimizations that achieve sub‑second latency and up to 1.5× speed‑up on modern GPUs.

GPU Optimizationcausal inferencelarge video generation
0 likes · 14 min read
Self-Forcing: Turning Global Video Diffusion into Causal Streaming for Long-Form Generation
Amap Tech
Amap Tech
May 8, 2025 · Artificial Intelligence

FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis

FantasyTalking generates high-fidelity, coherent talking portraits from a single static image by employing a two-stage audio-visual alignment—global segment-level motion and frame-level lip refinement—combined with face-centric cross-attention for identity preservation and a motion-intensity module that lets users control expression and body movement, achieving superior realism, synchronization, and performance over prior methods.

Deep Learningaudio-visual alignmentidentity preservation
0 likes · 10 min read
FantasyTalking: Realistic Talking Portrait Generation via Coherent Motion Synthesis
NewBeeNLP
NewBeeNLP
Mar 7, 2024 · Artificial Intelligence

How Sora is Redefining Large Vision Models: A Deep Dive into Technology, Limits, and Opportunities

This comprehensive review examines Sora, the first model capable of generating minute‑long, high‑quality videos from text, covering its historical background, core diffusion‑Transformer architecture, data preprocessing strategies, prompt engineering techniques, diverse applications, and the ethical and technical limitations that shape its future.

Multimodal AIPrompt engineeringSora
0 likes · 28 min read
How Sora is Redefining Large Vision Models: A Deep Dive into Technology, Limits, and Opportunities
Kuaishou Tech
Kuaishou Tech
Oct 31, 2023 · Artificial Intelligence

Kuaishou’s Nine Accepted Papers at ACM MM 2023: Summaries and Links

This article presents concise English summaries of nine Kuaishou research papers accepted at ACM MM 2023, covering topics such as no‑reference video quality assessment, adaptive video quality models, blind image super‑resolution, audio‑visual‑language transfer learning, motion‑aware video diffusion, large‑scale e‑commerce retrieval, and interactive segmentation.

aiaudio-visual languagee-commerce retrieval
0 likes · 18 min read
Kuaishou’s Nine Accepted Papers at ACM MM 2023: Summaries and Links