Artificial Intelligence 22 min read

How Deep Learning Transformed Face Recognition: From Images to Real‑Time Video

This article surveys the evolution of face recognition from early statistical methods to modern deep‑learning approaches, outlines key researchers, open‑source projects, popular APIs, core processing steps, the DeepFace architecture, datasets, and experimental results, providing a comprehensive guide for practitioners and researchers.

MaGe Linux Operations

Aug 21, 2018

How Deep Learning Transformed Face Recognition: From Images to Real‑Time Video

Traditional face‑recognition systems focused on image capture, preprocessing, identity verification, and search, but recent advances have extended the technology to driver monitoring, pedestrian tracking, and real‑time video processing, shifting from statistical methods like PCA to deep‑learning models such as CNN and RCNN, with growing interest in 3‑D face recognition.

Key Researchers

Prof. Shanshi Guang (Institute of Computing Technology, Chinese Academy of Sciences)

Prof. Li Ziqing (Biometrics Research Institute, Chinese Academy of Sciences)

Prof. Su Guangda (Tsinghua University)

Prof. Tang Xiaoe (The Chinese University of Hong Kong)

Ross B. Girshick

Major Open‑Source Projects

SeetaFace Engine – a BSD‑2 licensed C++ face‑recognition engine developed by the Chinese Academy of Sciences. https://github.com/seetaface/SeetaFaceEngine

Popular APIs/SDKs

Face++ – a cloud service offering free face detection, recognition, and attribute analysis, backed by Megvii Technology.

Skybiometry – provides face detection, recognition, and grouping services.

Common Face Image Datasets

Publicly available datasets include LFW (Labeled Faces in the Wild) and YFW (YouTube Faces in the Wild). LFW is the primary benchmark, with current image‑based recognition accuracy reaching 99%.

Face‑Recognition Process

The pipeline consists of four major stages: face detection, face alignment, face verification, and face identification.

Face Detection : Detect faces in an image and draw bounding boxes; OpenCV provides Haar cascades based on the Viola‑Jones algorithm.

Face Alignment : Correct pose using 2D or 3D alignment; 3D alignment leverages 67 facial landmarks and Delaunay triangulation to produce a frontal view.

Face Verification : Pair‑matching to decide whether two faces belong to the same person, often used in small‑office access control systems.

Face Identification : Classify a detected and aligned face into one of many known identities, typically using a deep neural network.

Face‑Recognition Categories

Current methods are divided into three categories: image‑based, video‑based, and 3‑D face recognition.

DeepFace

DeepFace, introduced by Facebook, paved the way for subsequent models such as DeepID and FaceNet. It demonstrates how deep learning can achieve near‑human performance in face verification.

1. DeepFace Basic Framework

1.1 Face‑Recognition Workflow

face detection → face alignment → face verification → face identification

1.2 Face Detection

Existing Techniques

Haar Classifier : Implemented in OpenCV, based on the Viola‑Jones algorithm.

Adaboost Cascade : Refer to "Robust Real‑Time Face Detection" and related blogs.

Method Used in This Article

Fiducial‑point detector using six landmarks (eye centers, nose tip, and three mouth points) learned via LBP features and SVR.

Select six reference points.

Learn their positions with LBP‑based SVR.

Result :

1.3 Face Alignment

2D Alignment : Crop, scale, rotate, and translate the detected face to six anchor locations.

3D Alignment :

Fit a 3‑D model to the 2‑D face using 67 landmarks.

Apply Delaunay triangulation and deform the mesh to obtain a frontal view.

Result :

1.4 Face Representation (Verification)

Existing Techniques

LBP & Joint Bayesian : Combine high‑dimensional LBP features with Joint Bayesian modeling.

DeepID Series : Fuse seven Joint Bayesian models with SVM to reach 99.15% accuracy.

Method in This Article

The network is trained on a multi‑class face‑recognition task. After 3‑D alignment, images are resized to 152×152×3 and fed into the following architecture:

Conv1: 32 filters of size 11×11×3

Max‑pooling: 3×3, stride 2

Conv2: 16 filters of size 9×9×16

Local‑Conv layers (non‑shared weights) with sizes 9×9, 7×7, 5×5 (16 each)

Fully‑connected: 4096 units

Softmax: 4030 units

Subsequent local‑connection layers capture region‑specific features, while two final fully‑connected layers (F7, F8) learn high‑level correlations such as eye‑mouth relationships. The output of F8 passes through a K‑way softmax for identity probabilities.

Training minimizes cross‑entropy loss via stochastic gradient descent; ReLU activation yields sparse top‑layer features (≈75% zeros) and dropout is applied to F7.

Feature vectors are first normalized per dimension by the maximum value in the training set, then L2‑normalized.

2. Validation

2.1 Chi‑Square Distance

DeepFace features share properties with histogram‑based descriptors: all values are non‑negative, sparse, and lie in [0, 1]. The chi‑square distance is computed as shown:

2.2 Siamese Network

After training, the network processes two input images, computes the absolute difference of their feature vectors, and feeds it to a fully‑connected layer that outputs a binary same/different decision.

3. Experimental Evaluation

3.1 Datasets

Social Face Classification (SFC): 4.4 M faces, 4030 identities

LFW: 13 323 faces, 5749 identities (restricted, unrestricted, unsupervised splits)

YouTube Faces (YTF): 3425 videos, 1595 identities

Result on LFW :

Result on YTF :

DeepFace differs from later methods by aligning faces before feeding them to the CNN, which stabilizes feature locations and enables effective convolutional learning.

github源码：https://github.com/ageitgey/face_recognition#face-recognition

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

CNN computer vision deep learning datasets face recognition opencv

Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.