Survey of Deep Learning Based Object Detection Algorithms
This survey reviews deep‑learning‑based object detection, tracing evolution from R‑CNN to modern two‑stage (Faster R‑CNN) and one‑stage (YOLO, SSD) models, discusses enhancements for scale variance, small‑object and domain‑shift challenges—including logo detection pipelines, PANet, temporal action localization, knowledge‑distillation training, and extensive speed‑accuracy trade‑off analyses to guide practitioners.
Object detection is a fundamental task in computer vision with nearly two decades of research. Recent advances in deep learning have shifted algorithms from hand‑crafted features to neural‑network‑based methods, starting from R‑CNN (2013) to modern approaches such as Faster R‑CNN, SSD, YOLO series, and lightweight models like Pelee.
The article is organized into three parts: (1) improvements to two‑stage and one‑stage detectors, (2) solutions to common challenges, and (3) extensions and additional surveys.
Part 1 reviews classic two‑stage (Faster R‑CNN) and one‑stage (YOLO, SSD) networks and their subsequent upgrades.
Part 2 summarizes typical problems (scale variance, small object detection, domain shift, etc.) and recent papers that propose solutions, including logo detection (Scalable Object Detection for Stylized Objects), instance segmentation (Path Aggregation Network), and temporal action localization (Rethinking the Faster R‑CNN Architecture for Temporal Action Localization).
For logo detection, the authors use a two‑step pipeline: YOLO‑v2 to locate candidate logos, followed by a triplet‑loss‑trained deep image‑similarity network for retrieval. The method addresses the large number of logo classes, rapid logo updates, and the distinct visual characteristics of logos.
Path Aggregation Network (PANet) enhances the Feature Pyramid Network by adding a bottom‑up augmentation path, adaptive feature pooling across all pyramid levels, and a complementary fully‑connected branch for mask prediction, achieving higher accuracy in both instance segmentation and object detection.
Temporal Action Localization adapts the Faster R‑CNN pipeline to the time domain, using a Segment Proposal Network to generate candidate video segments and a two‑stage refinement that incorporates multi‑tower dilated temporal convolutions and extended temporal context.
The survey also covers training object detectors from scratch. The paper “Mimicking Very Efficient Network for Object Detection” applies knowledge‑distillation: a large pretrained detector supervises a small randomly‑initialized network via mimic loss on RPN features and classification logits, followed by a second stage of ground‑truth supervision.
Finally, the article reviews the “Speed/accuracy trade‑offs for modern convolutional object detectors” (CVPR 2017), which conducts extensive experiments on Faster R‑CNN, SSD, and R‑FCN across various backbone networks (VGG16, ResNet‑101, MobileNet, Inception variants) and input resolutions. Key findings include: (i) two‑stage detectors generally achieve higher mAP, especially for small objects; (ii) SSD is more cost‑effective for large‑object scenarios; (iii) reducing the number of proposals can dramatically improve speed with minimal accuracy loss.
The overall conclusion is that the presented experimental analysis helps practitioners select appropriate detection algorithms and configurations based on speed‑accuracy requirements.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Meitu Technology
Curating Meitu's technical expertise, valuable case studies, and innovation insights. We deliver quality technical content to foster knowledge sharing between Meitu's tech team and outstanding developers worldwide.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
