Lightweight Remote Sensing Backbone LSKNet and Strip R-CNN: Design, Benchmarks, and Open‑Source Release

The NK‑Remote repository introduces LSKNet and Strip R‑CNN, two lightweight yet powerful models for remote‑sensing object detection that dynamically adjust receptive fields and combine square‑and‑strip convolutions, achieving state‑of‑the‑art performance on benchmarks such as DOTA, FAIR1M, HRSC2016, and DIOR.

AIWalker
AIWalker
AIWalker
Lightweight Remote Sensing Backbone LSKNet and Strip R-CNN: Design, Benchmarks, and Open‑Source Release

Problem and Background

Remote‑sensing object detection must rely on prior knowledge such as scene context and sensor characteristics because many targets appear as tiny blobs in high‑resolution aerial images. An analysis of the DOTA benchmark shows that elongated objects dominate the dataset and that detection accuracy drops sharply as the aspect‑ratio increases. Two concrete challenges arise:

Different object categories require context windows of varying spatial extents.

High‑aspect‑ratio objects contain dense features along one axis but sparse information along the orthogonal axis, making angle regression especially error‑prone.

Large Selective Kernel Network (LSKNet)

LSKNet introduces a spatial‑selection mechanism that dynamically adjusts the receptive field of a backbone. The core module stacks several large depthwise convolution kernels (e.g., 7×7, 11×11, 15×15) and computes per‑pixel attention weights from the input feature map. These weights are used to fuse the kernel outputs, allowing the network to select the most appropriate kernel size for each spatial location. Because the selection is input‑dependent, LSKNet can expand its receptive field for objects that need broader context while keeping a compact footprint. Two lightweight variants are provided: LSKNet‑T with 4.3 M parameters and LSKNet‑S with 14.4 M parameters. Both achieve state‑of‑the‑art performance on 14 mainstream remote‑sensing tasks, including classification, detection, semantic segmentation, and change detection.

Strip R‑CNN

Strip R‑CNN tackles the high‑aspect‑ratio problem by replacing the conventional square convolution in the backbone with a hybrid Strip module . The module combines a standard 3×3 convolution with an orthogonal long‑strip convolution (e.g., 1×k and k×1, where k≫1) that runs in both horizontal and vertical directions. This design captures anisotropic context without inflating the parameter count. A dedicated Strip Head augments the localization branch: it feeds the Strip module’s output into a refined bounding‑box regression head that jointly predicts center, size, and orientation, improving angle accuracy for elongated objects. The entire model contains only ~30 M learnable parameters.

Experimental Validation

Extensive experiments were conducted on DOTA, FAIR1M, HRSC2016, and DIOR. On DOTA, Strip R‑CNN reaches 82.75 % mAP without any test‑time tricks, surpassing previous SOTA methods that degrade with increasing aspect ratio. LSKNet‑S attains top‑ranked scores on 14 datasets, confirming that dynamic receptive‑field selection benefits both small‑object and large‑object detection. Ablation studies on the Strip module show that removing the long‑strip branch reduces mAP by >3 %, while fixing the kernel sizes in LSKNet (i.e., disabling the selection mechanism) leads to a 2.5 % drop, demonstrating the importance of the proposed adaptive components.

Code and Resources

All implementations are open‑source. The NK‑Remote repository containing LSKNet and Strip R‑CNN can be cloned from https://github.com/NK-JittorCV/nk-remote. The models are also integrated into the JDet model zoo ( https://github.com/Jittor/JDet), which hosts more than 20 state‑of‑the‑art remote‑sensing algorithms.

References

Shi‑Min Hu, Dun Liang, Guo‑Ye Yang, Guo‑Wei Yang, and Wen‑Yang Zhou. “Jittor: a novel deep learning framework with meta‑operators and unified graph execution.” Science China Information Science , 2020.

Yuxuan Li, Xiang Li, Yimian Dai, Qibin Hou, Li Liu, Yongxiang Liu, Ming‑Ming Cheng, and Jian Yang. “LSKNet: A foundation lightweight backbone for remote sensing.” International Journal of Computer Vision , vol. 133, no. 3, 2024, pp. 1410‑1431.

Xinbin Yuan, Zhao‑Hui Zheng, Yuxuan Li, Xialei Liu, Li Liu, Xiang Li, Qibin Hou, and Ming‑Ming Cheng. “Strip R‑CNN: Large Strip Convolution for Remote Sensing Object Detection.” arXiv preprint arXiv:2501.03775, 2025.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Deep Learningobject detectionBenchmarkremote sensingJDetLSKNetStrip R-CNN
AIWalker
Written by

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.