DAMO-YOLO: An Efficient Target Detection Framework with NAS, Multi‑Scale Fusion, and Full‑Scale Distillation
This article introduces DAMO‑YOLO, a high‑performance object detection framework that combines low‑cost model customization via MAE‑NAS, an Efficient RepGFPN with HeavyNeck for superior multi‑scale detection, and a full‑scale distillation technique, delivering faster inference, lower FLOPs, and higher accuracy across diverse industrial scenarios.
Object detection aims to locate and classify objects in images, videos, or point clouds, serving as a foundation for many computer‑vision applications such as autonomous driving, port management, intrusion detection, and face recognition.
Current detection frameworks suffer from three main drawbacks: limited model scale flexibility for different compute budgets, weak multi‑scale detection especially for small objects, and sub‑optimal speed‑accuracy trade‑offs.
DAMO‑YOLO addresses these issues with three technical advantages: (1) a self‑developed MAE‑NAS that enables low‑cost, latency‑ or FLOPs‑aware model customization without requiring real data or training; (2) Efficient RepGFPN combined with a HeavyNeck design that dramatically improves multi‑scale feature fusion while keeping computational cost low; (3) a full‑scale distillation method that transfers knowledge from large to small models without extra inference overhead, robust to heterogeneous architectures.
The MAE‑NAS approach treats a network as a continuous‑state information system, maximizes entropy of feature maps, and searches architectures using latency or FLOPs as constraints, producing backbone variants (T/S/M) that are then wrapped with CSP or ResStyle structures.
Efficient RepGFPN optimizes topology by assigning different channel numbers to each scale, removes costly up‑sampling connections from the original GFPN, and introduces a lightweight fusion block with re‑parameterization, achieving higher accuracy with reduced latency.
HeavyNeck allocates most of the computation budget to the neck, while the detection head (ZeroHead) remains a single linear projection layer for classification and regression, further improving efficiency.
The distillation pipeline aligns student features to the teacher, normalizes them with bias‑free batch‑norm, and applies a dynamically decaying loss weight, yielding consistent performance gains across T, S, and M models and demonstrating robustness to heterogeneous student‑teacher pairs.
Empirical results show DAMO‑YOLO outperforms state‑of‑the‑art detectors by 20‑40% speedup at equal accuracy, reduces FLOPs by 15‑50%, cuts parameters by 6‑50%, and improves detection of both small and large objects.
Models are publicly available on ModelScope and GitHub, with easy‑to‑use inference and training scripts; future work includes deployment tool enhancements, more application examples (e.g., drone small‑object detection, rotated boxes), and a broader range of model sizes from nano to large.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.