Artificial Intelligence 14 min read

How Temporal Residual Modeling Boosts Video Super‑Resolution Performance

This article introduces a novel video super‑resolution framework that unifies low‑ and high‑resolution temporal modeling using adjacent‑frame residual maps, achieving state‑of‑the‑art results on multiple benchmarks while maintaining high speed and flexibility.

Kuaishou Audio & Video Technology

Apr 22, 2022

How Temporal Residual Modeling Boosts Video Super‑Resolution Performance

Background

Super‑resolution is a classic computer‑vision technique that maps low‑resolution images to high‑resolution ones. With deep learning, convolutional networks have achieved remarkable results for image super‑resolution, prompting research into the more challenging video super‑resolution task, which requires effective temporal modeling to exploit complementary information across frames.

Problems with Existing Methods

Current temporal‑modeling approaches fall into two categories: (1) flow‑based, deformable‑convolution or 3D‑convolution methods that explicitly or implicitly model frame‑to‑frame dynamics, and (2) recurrent hidden‑state accumulation methods that aggregate features over time. Bidirectional recurrent networks improve information balance but suffer from high computational cost and difficulty integrating into causal (real‑time) systems. Moreover, existing frameworks lack a unified strategy for handling both low‑resolution (LR) and high‑resolution (HR) temporal information.

Proposed ETDM Framework

We propose ETDM, a video super‑resolution framework that uses temporal residual maps between adjacent frames to unify LR and HR temporal modeling. In the LR space, the residual map distinguishes low‑change (LV) and high‑change (HV) regions, allowing the network to treat them differently. In the HR space, the residual map acts as a bridge that propagates predictions across arbitrary past and future frames.

ETDM adopts a single‑direction recurrent convolutional network. For each time step the network receives two inputs: (i) an LR frame sequence (previous, current, next) and (ii) HR predictions from the previous step. Three residual heads—Spatial‑Residual, Past‑Residual, and Future‑Residual—predict the current super‑resolved frame and the temporal residual maps for past and future directions.

Temporal Residual Modeling

LV regions correspond to small motions, while HV regions capture larger motions; the HV branch uses larger receptive fields to capture broader motion cues. The residual maps serve as bridges that transfer information forward and backward, enabling a temporal bidirectional optimization mechanism that refines the current frame with complementary data from other time steps.

Memory Mechanism

A memory of length N stores the super‑resolved results of past and future frames. By accumulating residual maps, the framework can propagate information across any temporal distance, updating the memory with each new frame using a defined update formula.

Experiments

We train on the Vimeo‑90K dataset and evaluate on Vid4, SPMCS, UDM10, and REDS4. ETDM achieves state‑of‑the‑art PSNR and SSIM scores, surpassing methods such as EDVR, GOVSR, and BasicVSR while offering a better speed‑accuracy trade‑off. Qualitative comparisons show richer details and more accurate structures.

Conclusion

By unifying temporal modeling with frame‑wise residual maps, ETDM efficiently exploits complementary information in both LR and HR domains, providing flexible propagation, lower computational cost, and superior performance across multiple video super‑resolution benchmarks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

video super-resolution temporal modeling residual maps

Written by

Kuaishou Audio & Video Technology

Explore the stories behind Kuaishou's audio and video technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.