Artificial Intelligence 18 min read

How Laplacian Pyramid Networks Revolutionize Intrinsic Image Decomposition

This paper presents a scale‑space‑based deep neural network that treats intrinsic image decomposition as an image‑to‑image translation, introduces a multi‑channel Laplacian pyramid architecture with novel loss and data‑augmentation strategies, and demonstrates superior performance on MPI‑Sintel and MIT Intrinsic datasets.

Alibaba Cloud Developer

Sep 4, 2018

How Laplacian Pyramid Networks Revolutionize Intrinsic Image Decomposition

Abstract

We introduce a novel network architecture that decomposes an image into its intrinsic reflectance and illumination components. By treating the problem as an image‑to‑image translation and performing scale‑space decomposition of inputs and outputs, we design a multi‑channel network that learns a separate mapping for each Laplacian pyramid level. The network uses learnable up/down‑sampling operators to build Gaussian and Laplacian pyramids, and each sub‑network (residual block) consists of six Conv‑ELU layers. A new loss combines data, perceptual, and variational terms, and a data‑augmentation scheme inspired by breeder learning generates additional training pairs from unlabeled images. Experiments on MPI‑Sintel and MIT Intrinsic Images datasets show significant quantitative and qualitative improvements over previous state‑of‑the‑art methods.

1 Introduction

Intrinsic image decomposition separates an image into reflectance and shading, enabling material editing, depth cue extraction, and illumination analysis. Existing approaches such as Retinex are limited by gradient‑domain thresholds and cannot handle complex materials or sharp edges. We propose a deep neural network that learns the mapping in a scale‑space framework, extending the function approximation pipeline to parallel sub‑band transformations.

2 Our Method

2.1 Network Evolution

Starting from a ResNet‑style sequential architecture, we restructure the network into parallel low‑frequency (L) and high‑frequency (H) branches, applying a Laplacian pyramid expansion to the output. This yields a multi‑branch design where each branch predicts a specific pyramid component.

2.2 Residual Block

Each residual block is a six‑layer Conv(3×3)‑ELU stack without fully‑connected layers. Skip connections add the output of the final Conv to the input of the last ELU. The block uses 32 feature channels and outputs either a 3‑channel image or a residual map. ELU replaces ReLU and batch normalization, improving robustness to noise and training speed.

2.3 Loss Function

Our loss comprises three terms:

Data loss : a joint bilateral filter enforcing pixel‑level similarity and the constraint that the product of predicted reflectance and shading reconstructs the input.

Perceptual loss : VGG‑19 features preserve high‑level semantic structure.

Variational loss : smoothness regularization on the output.

2.4 Data‑Augmentation Training

We generate additional training pairs by first predicting reflectance and shading for unlabeled images, then synthesizing new images from these estimates. Adaptive manifold filtering further perturbs the predictions to increase diversity while preserving realism.

3 Experiments

3.1 Datasets

We evaluate on the MPI‑Sintel (ResynthSintel version) and MIT Intrinsic Images datasets, using both scene‑level and image‑level train/test splits.

3.2 Results on MPI‑Sintel

Our model outperforms prior methods on all three metrics, especially under the more challenging scene‑split protocol, demonstrating robustness to over‑fitting.

3.3 Results on MIT Intrinsic Images

Both standard data augmentation (Ours + DA) and an enhanced version (Ours + DA +) improve performance, confirming the effectiveness of our augmentation strategy.

4 Conclusion

We introduced a Laplacian‑pyramid‑inspired neural network for intrinsic image decomposition, modeling the task as a multi‑scale image‑to‑image translation. Experiments on two benchmark datasets demonstrate superior quantitative results and visual quality, and the architecture is applicable to other image‑to‑image tasks such as semantic segmentation or depth regression.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

deep neural network image-to-image translation intrinsic image decomposition Laplacian pyramid scale space

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.