BIPNet: Adaptive Progressive Upsampling Drives a Leap in Burst Image Restoration (TPAMI 2025)

The TPAMI 2025 paper introduces BIPNet, a unified burst‑image framework that tackles alignment, fusion, and upsampling challenges with edge‑enhanced alignment, pseudo‑burst feature fusion, and adaptive group upsampling, achieving state‑of‑the‑art results across super‑resolution, low‑light enhancement, and denoising while offering lightweight mobile variants.

AIWalker
AIWalker
AIWalker
BIPNet: Adaptive Progressive Upsampling Drives a Leap in Burst Image Restoration (TPAMI 2025)

Background and Motivation

Smartphone cameras suffer from small sensors and limited optics, causing noise, low resolution, and loss of detail in low‑light scenes. Burst photography—capturing a rapid sequence of frames and fusing them—mitigates these hardware constraints. The TPAMI 2025 paper introduces BIPNet, a unified network that simultaneously addresses burst super‑resolution, low‑light enhancement, low‑light super‑resolution, and burst denoising.

Core Challenges in Burst Image Processing

Frame alignment difficulty : Camera shake and object motion produce spatial and color mis‑alignments, leading to ghosting and zipper artifacts. Explicit motion estimation (e.g., optical flow) is parameter‑heavy and error‑prone.

Inflexible feature fusion : Conventional post‑fusion mechanisms restrict inter‑frame information exchange, preventing full exploitation of complementary frame content.

Poor upsampling quality : Single‑stage upsampling cannot simultaneously denoise and preserve high‑frequency details, often discarding fine textures or introducing artifacts.

BIPNet Overall Architecture

BIPNet follows a three‑stage pipeline: align → fuse → progressively upsample . The network consists of three core modules.

Edge‑Enhanced Feature Alignment (EBFA) : Denoises RAW burst frames, extracts features, and performs implicit alignment with deformable convolutions and a back‑projection refinement step.

Pseudo‑Burst Feature Fusion (PBFF) : Concatenates aligned features channel‑wise, generates a pseudo‑burst representation that aggregates complementary information from all frames, and refines it with a lightweight U‑Net.

Adaptive Group Upsampling (AGU) : Applies a three‑stage ×2 progressive upsampling. Features are split into groups; each group receives a dense pixel‑wise attention map that adaptively weights contributions from different frames, followed by transposed convolution for resolution increase.

BIPNet overall architecture diagram
BIPNet overall architecture diagram

Edge‑Enhanced Feature Alignment (EBFA)

The EBFA module first reduces RAW noise using residual‑in‑residual learning combined with global context attention, preserving low‑frequency content while capturing long‑range dependencies. A deformable convolution learns frame‑wise offsets implicitly, eliminating the need for a pre‑trained optical flow network. A back‑projection operation computes the residual between aligned features and the reference frame, reinjecting high‑frequency edge information, which enables precise alignment even under severe motion.

EBFA module structure
EBFA module structure

Pseudo‑Burst Feature Fusion (PBFF)

PBFF concatenates aligned features across the channel dimension and applies a convolution to produce a pseudo‑burst feature map that contains the complementary attributes of every input frame. A lightweight U‑Net further extracts multi‑scale deep features from this representation, enabling rich inter‑frame interaction.

PBFF and AGU modules
PBFF and AGU modules

Adaptive Group Upsampling (AGU)

AGU discards a monolithic upsampling stage and adopts a three‑step ×2 progressive strategy. The pseudo‑burst features are divided into groups; each group receives a dense attention map that adaptively adjusts fusion weights—uniform weighting in texture‑rich regions for denoising, reduced weight for misaligned frames to suppress ghosting. Transposed convolutions then upscale each group, progressively merging complementary information and substantially improving detail fidelity and image cleanliness.

Experimental Validation

Burst Super‑Resolution

On the SyntheticBurst (synthetic) and BurstSR (real) datasets, BIPNet outperforms the previous best MFIR by +0.37 dB and +0.16 dB PSNR respectively for 4× upsampling (Table 1). Visual results show sharper textures without speckle artifacts, and even 8× upsampling recovers rich details.

PSNR comparison for burst super‑resolution
PSNR comparison for burst super‑resolution
SyntheticBurst visual comparison
SyntheticBurst visual comparison
8× upsampling visual result
8× upsampling visual result

Low‑Light Enhancement & Low‑Light Super‑Resolution

On the SID low‑light enhancement benchmark, BIPNet gains +2.83 dB PSNR over the prior best (Table 2) and produces brighter, color‑accurate images. For low‑light 4× super‑resolution, it again leads the field (Tables 3 & 4), overcoming the information scarcity of single‑frame low‑light inputs.

Low‑light enhancement PSNR comparison
Low‑light enhancement PSNR comparison
Low‑light enhancement visual comparison
Low‑light enhancement visual comparison
Low‑light super‑resolution visual result
Low‑light super‑resolution visual result

Burst Denoising

For grayscale and color burst denoising, BIPNet surpasses MFIR by +0.91 dB (grayscale) and +0.58 dB (color) even when tested on unseen high‑noise levels (gain = 8). Visuals show cleaner reconstructions with no residual noise.

Grayscale denoising PSNR comparison
Grayscale denoising PSNR comparison
Color denoising PSNR comparison
Color denoising PSNR comparison
Burst denoising visual comparison
Burst denoising visual comparison

Ablation Study

On SyntheticBurst, a baseline without the three modules reaches 36.38 dB PSNR. Adding EBFA, PBFF, and AGU sequentially raises performance to 41.55 dB (+5.17 dB). Replacing EBFA or PBFF with alternative alignment/fusion techniques degrades results (Table 7), confirming each module’s unique contribution.

Module contribution table
Module contribution table
Performance after module replacement
Performance after module replacement

Visualization shows that without EBFA, adjacent frames exhibit noticeable sub‑pixel shifts; with EBFA, the shifts disappear and noise is markedly reduced.

EBFA alignment visualization
EBFA alignment visualization

Limitations and Future Directions

BIPNet currently uses the first frame as the reference. Severe distortion in this reference degrades output. Future work could incorporate an adaptive reference‑frame selection mechanism that dynamically chooses the highest‑quality frame, and extend the core modules to more complex scenarios such as dynamic scenes and extreme low‑light conditions.

Impact of reference frame quality
Impact of reference frame quality

Conclusion

BIPNet unifies burst image restoration tasks through three innovative modules—edge‑enhanced alignment, pseudo‑burst fusion, and adaptive group upsampling—delivering state‑of‑the‑art performance on super‑resolution, low‑light enhancement, and denoising while offering lightweight variants (BIPNet‑16/32) suitable for mobile deployment. Its adaptive, progressive design breaks the bottleneck of traditional post‑fusion pipelines and provides a practical pathway for high‑quality imaging on portable devices.

Code example

关注
「
AIWalker
」
并
星标
从此AI不迷路
作者 | 小白 来源 | 小白学视觉
computer visionSuper-ResolutionDenoisingImage RestorationBIPNetBurst Image ProcessingLow-Light Enhancement
AIWalker
Written by

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.