How HGO‑YOLO Achieves 87.4% Accuracy at 56 FPS with Only 4.6 MB Parameters
This paper presents HGO‑YOLO, a lightweight real‑time anomaly‑behavior detector that integrates HGNetv2 and GhostConv into YOLOv8, achieving 87.4% mAP with just 4.6 MB of parameters and 56 FPS on CPU, and validates its performance across multiple datasets and hardware platforms.
Introduction
Accurate and real‑time object detection is crucial for anomaly‑behavior monitoring, especially on hardware‑constrained devices. The authors propose HGO‑YOLO, which embeds the HGNetv2 backbone into YOLOv8 and introduces a lightweight detection head (OptiConvDetect) to balance precision and speed.
Related Work
Prior works improve multi‑scale detection via feature‑pyramid networks (e.g., PANet, QAFPN) and lightweight backbones such as MobileNet and ShuffleNetv2. However, these methods often increase computational cost or reduce representational capacity. GhostConv, introduced by Han et al. [16], reduces parameters by generating redundant feature maps.
Method
YOLOv8 Overview
YOLOv8 consists of a Backbone (C2f with SPPF), a Neck (FPN‑like), and a decoupled Head that separates classification and regression branches, using distribution‑based Focal Loss for regression.
HGO‑YOLO Architecture
The backbone replaces the standard backbone with HGNetv2, which stacks HGBlocks of varying filter sizes to capture hierarchical graph‑structured features. GhostConv replaces standard Conv in selected layers, cutting FLOPs from 8.9 GFLOPs to 4.3 GFLOPs while preserving accuracy. The detection head, OptiConvDetect, shares a single PConv layer across three heads, followed by a 1×1 Conv (Convpt) for each branch, drastically reducing parameters.
Key Components
HGNetv2 : Hierarchical graph network that aggregates multi‑scale features via graph convolutions, enhancing robustness.
GhostConv : Generates primary features with a cheap convolution, then cheap linear operations to produce redundant features; computational cost drops to 1⁄s of traditional Conv.
OptiConvDetect : Parameter‑shared detection head that decouples classification and regression while reusing PConv, achieving lower GFLOPs and higher FPS.
Experiments
Datasets
Six public datasets covering fall, fight, and smoking scenarios were merged, yielding 10,201 images (4,065 fall, 3,224 fight, 2,912 smoke). Data were split 8:1:1 for training, validation, and testing.
Setup
All models (YOLOv5‑v11, RT‑DTETR, and HGO‑YOLO variants) were implemented in PyTorch and evaluated on an Intel Xeon Silver 4310 CPU, NVIDIA A100 80 GB GPU, and Ubuntu 20.04.2. Training used 200 epochs, batch size 32, with hyper‑parameters taken from the original repositories.
Metrics
mAP, FPS, and GFLOPs were used. mAP measures average precision across classes; GFLOPs indicate inference cost.
Ablation Study
Five configurations were compared (Table 1): H‑YOLO (HGNetv2 only), HG‑YOLO (HGNetv2 + GhostConv), O‑YOLO (OptiConvDetect only), HO‑YOLO (HGNetv2 + OptiConvDetect), and HGO‑YOLO (all three). HGO‑YOLO achieved the best trade‑off: 56 FPS, 87.4% mAP, with per‑class mAP of 85.1 (fall), 93.1 (fight), 75.4 (smoke), 96.0 (person).
Scale Analysis
Across model sizes (YOLOv8n, YOLOv8s, YOLOv8m, YOLOv8l, YOLOv8x), HGO‑YOLO consistently improved mAP by 2.8‑3.7 points and reduced GFLOPs, demonstrating scalability.
Loss Function Comparison
Four IoU‑based losses were evaluated: DIoU, CIoU, MPDIoU, and Inner‑CIoU. MPDIoU yielded the highest mAP (value omitted in source) and was adopted for HGO‑YOLO [24].
Device Tests
Real‑time inference on Raspberry Pi 4 and NVIDIA devices showed HGO‑YOLO outperforming YOLOv8, reaching 33.33 FPS on the NVIDIA platform.
Visualization
Qualitative comparisons (Figures 7‑9) illustrate that HGO‑YOLO produces higher confidence detections for falls, fights, and smoke, even under low‑light or occlusion, while reducing false positives.
Conclusion
HGO‑YOLO delivers superior accuracy (87.4% mAP) and speed (56 FPS) with a tiny 4.6 MB footprint, making it suitable for edge devices. Limitations include reduced performance on very small smoke targets and potential degradation on extremely low‑resource hardware. Future work will focus on further optimization for diverse deployment scenarios.
AIWalker
Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
