Can UniConvNet Expand Receptive Fields While Preserving Gaussian Distribution?

The paper introduces UniConvNet, a novel convolutional architecture that expands the effective receptive field (ERF) of ConvNets without breaking the asymptotically Gaussian distribution (AGD), achieving superior accuracy‑parameter and accuracy‑FLOPs trade‑offs across image classification, detection, and segmentation benchmarks.

AI Frontier Lectures
AI Frontier Lectures
AI Frontier Lectures
Can UniConvNet Expand Receptive Fields While Preserving Gaussian Distribution?

Paper Information

Title: UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale

Link: https://arxiv.org/pdf/2508.09000

Background

Convolutional neural networks (ConvNets) struggle to capture long‑range dependencies efficiently. Traditional solutions either enlarge convolution kernels or stack many small kernels, which dramatically increase parameters and FLOPs and often break the asymptotically Gaussian distribution (AGD) of the effective receptive field (ERF). Maintaining AGD while expanding ERF is a critical design challenge.

Methodology

Receptive Field Aggregator (RFA)

The RFA splits the input tensor along the channel dimension into N+1 heads (where N is the number of RFA layers). The first head A1 is processed by a Layer Operator (LO) to generate a new head A2. This recursive design enables progressive channel growth while keeping the overall parameter budget low.

Layer Operator (LO)

Each LO consists of two components:

Amplifier (Amp): Applies GELU activation, a depthwise large‑kernel convolution K×K, and element‑wise multiplication with a second projected head, thereby expanding the receptive field.

Discriminator (Dis): Combines a large‑kernel K×K depthwise convolution with a small‑kernel k×k (typically 3×3) to inject fine‑scale pixel information, establishing a two‑layer AGD.

For an input size of 224×224, a three‑layer RFA (N=3) is used. The progressive large‑kernel sizes follow K = 2n + 5 (n = 1…N), yielding kernels of 7×7, 9×9, and 11×11, while the small kernel remains 3×3.

UniConvNet Architecture

Standard convolution blocks are replaced by the three‑layer RFA module. This yields a pyramid‑like channel increment that reduces both parameters and FLOPs while preserving AGD across all stages.

UniConvNet overall architecture diagram
UniConvNet overall architecture diagram

Experimental Analysis

Image Classification

Multiple UniConvNet variants were trained on ImageNet‑1K and compared with state‑of‑the‑art CNNs (e.g., SLaK‑T, UniRepLKNet‑T) and Vision Transformers. Across lightweight and large‑scale settings, UniConvNet consistently achieved higher top‑1 accuracy while using fewer parameters and FLOPs.

Parameter‑accuracy trade‑off plot
Parameter‑accuracy trade‑off plot

Downstream Tasks

After ImageNet‑1K pre‑training, UniConvNet was fine‑tuned on COCO for object detection (RetinaNet, SSDLite, Mask R‑CNN, Cascade Mask R‑CNN) and on ADE20K for semantic segmentation (DeepLabv3, PSPNet, UperNet). In all cases, UniConvNet variants achieved higher mAP (detection) and mIoU (segmentation) with reduced computational cost.

Object detection results
Object detection results
Semantic segmentation results
Semantic segmentation results

Conclusion

The study introduces the Receptive Field Aggregator (RFA) as a plug‑and‑play module that expands the effective receptive field while preserving its asymptotically Gaussian distribution. UniConvNet, built upon RFA, consistently outperforms existing ConvNets and Vision Transformers on a wide range of visual recognition tasks, demonstrating that large ERF can be achieved without sacrificing efficiency.

Key Findings

RFA enables ERF expansion with AGD preservation, addressing a long‑standing design dilemma.

Three‑layer RFA modules replace conventional convolutions, reducing parameters and FLOPs.

UniConvNet achieves state‑of‑the‑art accuracy on ImageNet, COCO, and ADE20K while being more computationally efficient.

Contributions

Proposes a novel RFA module that can be inserted into any ConvNet.

Demonstrates that maintaining AGD is crucial for high‑performance large‑ERF designs.

Provides extensive empirical evidence across classification, detection, and segmentation.

image classificationdeep learningmodel architectureconvolutional neural networksEffective Receptive FieldUniConvNet
AI Frontier Lectures
Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.