Artificial Intelligence 11 min read

FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025

The FlightVGM paper, awarded Best Paper at FPGA 2025, details a novel FPGA-based inference IP for video generation models that leverages time‑space activation sparsity, mixed‑precision DSP58 extensions, and adaptive scheduling to achieve up to 1.30× performance and 4.49× energy‑efficiency gains over a NVIDIA 3090 GPU while preserving model accuracy.

DataFunTalk
DataFunTalk
DataFunTalk
FlightVGM: FPGA-Accelerated Inference for Video Generation Models Wins Best Paper at FPGA 2025

At the FPGA 2025 conference, the Best Paper award was presented to FlightVGM, a video‑generation model inference IP jointly developed by researchers from Shanghai Jiao Tong University, Tsinghua University, and the startup 无问芯穹. This marks the first time the award has gone to a work led entirely by a mainland Chinese team.

FlightVGM implements efficient inference for Video Generation Models (VGMs) on FPGA, following the team’s earlier work FlightLLM for large language models. Compared with an NVIDIA 3090 GPU, FlightVGM on an AMD V80 FPGA delivers a 1.30× speedup and a 4.49× improvement in energy efficiency, with peak compute performance differences exceeding 21×.

The paper’s core contributions are threefold:

1. Time‑Space Activation Sparsity: By detecting similarity between tokens across frames and within frames using cosine similarity, redundant activations are skipped, dramatically reducing compute load.

2. Float‑Int Mixed‑Precision DSP58 Extension: Critical modules (e.g., attention) retain FP16 precision while less‑critical parts are quantized to INT8, realized through a novel DSP‑Expansion (DSP‑E) architecture that dynamically reconfigures DSP58 resources.

3. Dynamic‑Static Adaptive Scheduling: The scheduler reorders operations based on actual workload to mitigate load imbalance caused by sparsity.

Experimental results show that FlightVGM incurs negligible accuracy loss (average drop of 0.008, or 0.042 with full INT8 quantization) while delivering superior performance and energy efficiency compared with GPU baselines. The sparsity‑aware and mixed‑precision design enables the FPGA implementation to outperform the GPU despite the latter’s higher raw compute capability.

Looking forward, the authors propose exploring AI Engine (AIE) plus High‑Bandwidth Memory (HBM) architectures to further boost efficiency for video‑generation workloads, positioning FPGA as a key platform for future large‑model inference.

AIvideo generationModel Inferencehardware accelerationFPGAMixed Precision
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.