Artificial Intelligence 12 min read

JD AI's JDAI-Face: Real-Time Multi-Task Facial Attribute Recognition System

The article introduces JD AI's JDAI-Face system, a deep‑learning based real‑time multi‑task facial attribute recognition platform that detects gender, age, ethnicity, expression and attractiveness, outlines its technical pipeline, showcases retail applications, and cites recent academic publications and expert contributors.

JD Tech
JD Tech
JD Tech
JD AI's JDAI-Face: Real-Time Multi-Task Facial Attribute Recognition System

Humans can easily interpret visual information, but endowing computers with similar capabilities for brain‑like reasoning is a major research focus in artificial intelligence; facial attributes such as gender, ethnicity, age, expression, and attractiveness contain rich information that researchers aim to predict.

Existing facial attribute methods usually target a single task (e.g., age estimation or gender classification). Extending these single‑task models to handle multiple attributes increases algorithmic complexity and latency, making deployment difficult, so designing a multi‑task facial attribute algorithm that predicts several attributes in real time remains a challenge.

JD AI Research Institute has launched a new facial attribute recognition system called JDAI‑Face . It leverages deep convolutional neural networks for classification or regression, adopts a multi‑task learning framework that allows different attributes to assist each other, and achieves real‑time facial attribute prediction.

The system has rich applications in the retail sector, such as real‑time user profiling on smart display terminals for precise ad or product recommendation, intelligent cameras that analyze entering customers for target‑group analysis, and creative marketing interactions that leave lasting impressions.

During JD’s 618 launch event, interactive demos showed that the audience’s gender ratio was roughly 6:4, with the majority of attendees aged 18‑30 and a high proportion of attractive participants; these attribute distributions were then used to match pre‑defined business content for accurate advertising and product suggestions.

JD Face Attribute Recognition System Main Process

The system first detects faces in an image, then for each detected face it predicts multiple attributes (gender, ethnicity, age, smile, attractiveness, etc.). The pipeline consists of four stages: face detection, landmark localization, face alignment, and attribute recognition.

Face Detection . JD AI’s platform uses a deep‑learning face detector with multiple network branches and multi‑scale processing, integrating information from various layers to achieve high accuracy and fast real‑time detection. An anchor density balancing strategy is employed to improve detection of small faces.

Landmark Localization . After detecting a face, the algorithm locates 25‑106 facial landmarks (nose, eyes, mouth, contour, etc.) using a regression network; for large‑pose faces, a 3D fitting method and a specially constructed large‑pose training set improve robustness, and a semi‑automatic annotation tool is used for labeling.

Face Alignment . Using the predicted landmarks and a template, a 2D affine transformation aligns the face so that the eyes become horizontal, preparing the image for attribute recognition.

Attribute Recognition . The system predicts attributes such as age, gender, expression, ethnicity, facial hair, and accessories using deep CNNs for classification or regression within a multi‑task learning framework, allowing different attributes to mutually benefit each other.

The five main attributes displayed are gender, ethnicity, age, smile intensity, and attractiveness. Gender and ethnicity are trained with a Softmax loss, while smile and attractiveness are regressed to a 0‑100 scale. Age estimation uses a Group‑n encoding strategy that converts a one‑hot age label into multiple overlapping groups, turning the problem into several binary classifications.

During testing, the predicted groups are decoded to obtain the final age estimate. The overall training and testing pipelines are illustrated below.

Red boxes indicate the training stage where Group‑n encoding transforms one‑hot ages into multi‑label groups for multi‑task binary classification; blue boxes show the testing stage where predicted groups are decoded to recover the age.

With rapid advances in computer vision and AI worldwide, Chinese enterprises and academia are playing increasingly prominent roles at top conferences. JD continues to demonstrate its commitment to technology‑driven business by showcasing these achievements.

At the recent IEEE CVPR conference, four papers from JD AI’s platform and research team were accepted, representing cutting‑edge progress in multiple computer‑vision topics and setting the direction for future developments.

These frontier research results are being integrated into JD’s business scenarios, enabling deep technology‑business fusion and delivering higher‑quality experiences to users and partners.

“JD hopes that technology is not a cold machine but a warm creation that lets users enjoy a better life.”

References

[1] Zhang, S., Zhu, X., Lei, Z., Shi, H., Wang, X., & Li, S. Z. (2017). FaceBoxes: a CPU real‑time face detector with high accuracy. arXiv preprint arXiv:1708.05234.

[2] Zhu, X., Lei, Z., Liu, X., Shi, H., & Li, S. Z. (2016). Face alignment across large poses: A 3D solution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 146‑155).

[3] Tan, Z., Wan, J., Lei, Z., Zhi, R., Guo, G., & Li, S. Z. (2017). Efficient Group‑n Encoding and Decoding for Facial Age Estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence.

Expert Introduction

Shi Hailin , senior researcher at JD AI Platform & Research Department’s Computer Vision & Multimedia Lab, PhD from Institute of Automation, Chinese Academy of Sciences, focuses on pattern recognition and computer vision, especially deep‑learning based face recognition and pedestrian re‑identification, with multiple top‑conference papers and competition awards.

Wan Jun , assistant researcher at Institute of Automation, Chinese Academy of Sciences, PhD in Signal and Information Processing from Beijing Jiaotong University, researches pattern recognition, computer vision, and machine learning, currently working on facial attribute analysis, gesture and behavior recognition, with 37 papers in international journals and conferences.

artificial intelligencecomputer visiondeep learningFace Recognitionmultitask learningretail applications
JD Tech
Written by

JD Tech

Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.