Can UniConvNet Expand Receptive Fields While Preserving Gaussian Distribution?
The paper introduces UniConvNet, a novel convolutional architecture that expands the effective receptive field (ERF) of ConvNets without breaking the asymptotically Gaussian distribution (AGD), achieving superior accuracy‑parameter and accuracy‑FLOPs trade‑offs across image classification, detection, and segmentation benchmarks.
Paper Information
Title: UniConvNet: Expanding Effective Receptive Field while Maintaining Asymptotically Gaussian Distribution for ConvNets of Any Scale
Link: https://arxiv.org/pdf/2508.09000
Background
Convolutional neural networks (ConvNets) struggle to capture long‑range dependencies efficiently. Traditional solutions either enlarge convolution kernels or stack many small kernels, which dramatically increase parameters and FLOPs and often break the asymptotically Gaussian distribution (AGD) of the effective receptive field (ERF). Maintaining AGD while expanding ERF is a critical design challenge.
Methodology
Receptive Field Aggregator (RFA)
The RFA splits the input tensor along the channel dimension into N+1 heads (where N is the number of RFA layers). The first head A1 is processed by a Layer Operator (LO) to generate a new head A2. This recursive design enables progressive channel growth while keeping the overall parameter budget low.
Layer Operator (LO)
Each LO consists of two components:
Amplifier (Amp): Applies GELU activation, a depthwise large‑kernel convolution K×K, and element‑wise multiplication with a second projected head, thereby expanding the receptive field.
Discriminator (Dis): Combines a large‑kernel K×K depthwise convolution with a small‑kernel k×k (typically 3×3) to inject fine‑scale pixel information, establishing a two‑layer AGD.
For an input size of 224×224, a three‑layer RFA (N=3) is used. The progressive large‑kernel sizes follow K = 2n + 5 (n = 1…N), yielding kernels of 7×7, 9×9, and 11×11, while the small kernel remains 3×3.
UniConvNet Architecture
Standard convolution blocks are replaced by the three‑layer RFA module. This yields a pyramid‑like channel increment that reduces both parameters and FLOPs while preserving AGD across all stages.
Experimental Analysis
Image Classification
Multiple UniConvNet variants were trained on ImageNet‑1K and compared with state‑of‑the‑art CNNs (e.g., SLaK‑T, UniRepLKNet‑T) and Vision Transformers. Across lightweight and large‑scale settings, UniConvNet consistently achieved higher top‑1 accuracy while using fewer parameters and FLOPs.
Downstream Tasks
After ImageNet‑1K pre‑training, UniConvNet was fine‑tuned on COCO for object detection (RetinaNet, SSDLite, Mask R‑CNN, Cascade Mask R‑CNN) and on ADE20K for semantic segmentation (DeepLabv3, PSPNet, UperNet). In all cases, UniConvNet variants achieved higher mAP (detection) and mIoU (segmentation) with reduced computational cost.
Conclusion
The study introduces the Receptive Field Aggregator (RFA) as a plug‑and‑play module that expands the effective receptive field while preserving its asymptotically Gaussian distribution. UniConvNet, built upon RFA, consistently outperforms existing ConvNets and Vision Transformers on a wide range of visual recognition tasks, demonstrating that large ERF can be achieved without sacrificing efficiency.
Key Findings
RFA enables ERF expansion with AGD preservation, addressing a long‑standing design dilemma.
Three‑layer RFA modules replace conventional convolutions, reducing parameters and FLOPs.
UniConvNet achieves state‑of‑the‑art accuracy on ImageNet, COCO, and ADE20K while being more computationally efficient.
Contributions
Proposes a novel RFA module that can be inserted into any ConvNet.
Demonstrates that maintaining AGD is crucial for high‑performance large‑ERF designs.
Provides extensive empirical evidence across classification, detection, and segmentation.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
