Artificial Intelligence 10 min read

Comprehensive Overview of Face Detection Methods and Techniques

This article provides an in‑depth review of face detection, covering traditional knowledge‑, model‑, feature‑ and appearance‑based approaches, modern deep‑learning methods such as cascade CNN, MTCNN and Facebox, strategies for handling multi‑scale faces, anchor‑box densification, and practical training considerations.

Qunar Tech Salon

Dec 10, 2019

Comprehensive Overview of Face Detection Methods and Techniques

In the field of object detection, face detection is a specialized sub‑task that differs from generic object detection due to the unique challenges of small size, low feature contrast, and occlusion.

Traditional face detection methods can be divided into four categories: knowledge‑based, model‑based, feature‑based, and appearance‑based algorithms.

Since 2006, deep learning has been applied to face detection, leading to significant robustness improvements. Convolutional‑neural‑network‑based methods such as Cascade CNN, MTCNN, and Facebox are now dominant.

Handling faces of different sizes :

Traditional approaches use image pyramids (see Image 1) or sliding windows of varying sizes (see Image 2).

Deep‑learning approaches also use image scaling, but often replace sliding windows with fully convolutional networks (FCN) to avoid repeated convolutions.

Anchor‑box strategies predict boxes directly on feature maps (see Image 3).

Locating face positions can be done via sliding windows, FCN mapping, or anchor‑box regression. Non‑Maximum Suppression (NMS) removes overlapping boxes based on confidence scores.

Cascade CNN consists of six CNNs: three for binary face/non‑face classification and three for bounding‑box regression. The pipeline scans the image with a 12‑net, discards >90% of windows, refines the remaining windows, and applies NMS.

The cascade’s advantages are early‑stage simplicity for high recall and later‑stage complexity for high precision, enabling efficient processing of fewer candidate windows.

MTCNN (Multi‑Task CNN) contains three subnetworks (P‑Net, R‑Net, O‑Net). An image pyramid feeds P‑Net to generate candidate boxes and facial landmarks; R‑Net refines these boxes; O‑Net further refines and predicts landmarks, improving robustness on limited data.

Facebox introduces Rapidly Digested Convolutional Layers (RDCL) to quickly reduce feature‑map size (stride 32) using Conv1/Pool1/Conv2/Pool2 with kernels 7×7, 3×3, 5×5, 3×3 and CReLU activation. Multiple Scale Convolutional Layers (MSCL) later detect faces at various scales using Inception modules, similar to SSD.

Anchor densification improves anchor density by duplicating anchors with slight offsets (see Image 4).

Training samples consist of face images and non‑face images. The cascade framework allows early stages to be lightweight for high recall, while later stages become more complex for precision.

Authors: Liang Zhicheng (AI competition top‑100, translator of "Deep Learning 500 Questions"), Liu Peng (Computer Vision Engineer at Gaoding Technology), Chen Fangjie (Master at Shanghai University).

Original source: CSDN Blog

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CNN computer vision face detection anchor box Cascade CNN MTCNN

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.