Can Frequency‑Domain Learning Boost Image Inference Efficiency?

This article presents a system‑level approach that performs deep‑learning inference directly on JPEG frequency components, uses a gating mechanism to select important DCT coefficients, and demonstrates higher accuracy with far lower bandwidth for image classification and instance segmentation tasks.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
Can Frequency‑Domain Learning Boost Image Inference Efficiency?

1. Basic Framework of Image Transmission/Storage/Analysis System

Modern computer‑vision pipelines process RGB images through capture, compression, transmission, decompression and inference. In real‑time systems the compression/decompression stages can dominate latency and power consumption, especially the DCT and IDCT modules.

Figure 1 shows that image processing time can be twice that of inference in a GPU‑based system.

2. Machine Learning in the Frequency Domain

2.1 Using Frequency Information for Learning

Each 8×8 DCT block yields 64 coefficients per channel (Y, Cb, Cr), producing 192 feature maps (56×56×192 for a 448×448 image). These maps can be fed directly into existing CNNs such as ResNet‑50 without spatial‑domain conversion.

Figure 10(a) illustrates the preprocessing pipeline; Figure 10(b) shows how the frequency‑domain feature maps are attached to the first Residual Block of ResNet‑50.

2.2 Selecting Important Frequency Components

A gating network learns the importance of each of the 192 feature maps. After average pooling each map to a scalar, a fully‑connected layer produces a 2‑dimensional score per map; the larger score determines whether the map is kept. The gate is trained with a Gumbel‑softmax estimator so gradients can flow through the discrete selection.

Figure 11 visualises the gating mechanism.

Two selection strategies are explored:

Dynamic : the gate decides per input which frequencies to transmit, reducing bandwidth on a per‑image basis.

Static : a fixed subset of frequencies is chosen after training, allowing the encoder to omit unimportant coefficients entirely.

3. Experimental Results

3.1 Image Classification

Using ImageNet, ResNet‑50 and MobileNetV2 were evaluated. Selecting only 24 out of 192 frequency maps (14 Y, 5 Cb, 5 Cr) reduced transmission bandwidth to one‑eighth while improving top‑1 accuracy from 75.78 % to 77.20 % for ResNet‑50 and from 71.70 % to 72.36 % for MobileNetV2.

Figure 13 shows a heat‑map of frequency importance for classification.

3.2 Instance Segmentation

Mask R‑CNN was trained on COCO. Using the same frequency‑selection strategy increased mask‑AP from 34.2 % to 35.0 % and object‑detection AP from 37.3 % to 38.1 %.

Figure 15 provides a visual example of instance segmentation using selected frequency components.

4. Future Work and Discussion

The current dynamic approach avoids modifying the encoder, but a static scheme could further reduce encoding cost and transmission bandwidth. Extending the method to video compression, where inter‑frame prediction changes frequency characteristics, is a promising direction. Ultimately, the research questions whether machine‑learning‑friendly features can be designed to discard spatial‑domain redundancy and save bandwidth between decoder and AI engine.

Acknowledgement

The work was conducted by Kai Xu, Minghai Qin, Yuhao Wang, Fei Sun, Chao Cheng, Yen‑kuang Chen, and Yuan Xie at Alibaba DAMO Academy, with contributions from Prof. Fengbo Ren (Arizona State University).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

computer visiondeep learningimage compressionBandwidth Reductionfrequency domain
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.