Efficient Training for Very Large‑Scale Face Recognition and the FFC Framework
This article reviews the challenges of ultra‑large‑scale face recognition, presents existing solutions such as metric learning, PFC and VFC, and details the proposed FFC framework with dual loaders, ID groups, probe and gallery networks, plus experimental results showing its cost‑effective performance.
Background : Image classification is one of the most successful AI applications and is widely used in tasks such as image search, OCR, content moderation, and identity verification. When the number of classes (IDs) grows to tens or hundreds of millions, training under typical deep‑learning frameworks becomes prohibitively expensive.
Challenges :
High cost: larger ID sets require more GPU memory, multi‑machine communication, and storage, dramatically increasing hardware and operational expenses.
Long‑tail distribution: most IDs have very few samples, making it difficult for standard training to converge and causing the tail classes to be under‑represented.
The remainder of the article introduces existing large‑scale classification solutions and the low‑cost FFC (Fast Face Classification) framework.
Existing methods :
Metric learning
PFC framework
VFC framework
Proposed method – FFC framework :
In conventional fully‑connected (FC) training, the loss is computed over all class centers, which leads to massive memory consumption. FFC selects a subset of class centers (denoted as V j ) for each batch, reducing the effective FC size.
The training pipeline includes two loaders: id_loader (sampling by ID) and instance_loader (sampling by instance). This ensures that both frequent and few‑shot classes are seen each epoch.
Before training, a fraction (e.g., 10%) of samples are placed into an ID group . During each iteration, samples pass through a probe net . If a sample’s ID already exists in the group (existing ID), its feature is compared with the group feature using cross‑entropy loss; otherwise (fresh ID) the cosine similarity is minimized.
After the forward pass, the group features are updated: new class centers replace old ones, and existing centers are weighted. The gallery net is updated with a moving‑average strategy, similar to MoCo, to stabilize training.
Tricks introduced :
ID Group size is a tunable hyper‑parameter (default = 3).
Moving‑average update of the gallery network for stable convergence.
Experimental results :
1. Dual‑loader ablation study (image shown below).
2. Comparison with state‑of‑the‑art methods (image shown below).
3. Memory consumption and sample throughput comparison (image shown below).
Overall, the FFC framework achieves comparable or better accuracy with significantly lower GPU memory usage and higher throughput, addressing both cost and long‑tail challenges in ultra‑large‑scale face recognition.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.