Kuaishou Y‑Tech’s Real‑Time, High‑Precision Facial & Body Keypoint Detection Explained
Y‑Tech’s in‑house keypoint detection system powers Kuaishou’s beauty and effect filters across live streaming, video creation, and editing by leveraging lightweight deep‑learning models, extensive multi‑scenario data collection, and specialized handling of occlusion, enabling real‑time, robust facial and body landmark tracking on diverse mobile devices.
Overview
Keypoint detection locates specific human body points (eyes, mouth, shoulders, etc.) and underpins advanced facial and body analysis tasks such as recognition, 3D reconstruction, expression detection, beautification, and body shaping. Accuracy and stability of keypoint detection directly affect downstream task performance.
Business Applications
At Kuaishou, Y‑Tech’s robust, precise, and efficient keypoint detection supports various effect filters (beauty, makeup, shaping, body, and novelty accessories) across live streaming, video capture, and editing, powering products like Kuaishou, Yitiao, Kuaishou Film, and Live Companion.
Effect filters provide one‑click makeup, reduce preparation effort, and increase user engagement, thereby enriching platform content.
The system also outputs occlusion information; a lightweight branch predicts whether keypoints are occluded, allowing the pipeline to suppress makeup on occluded regions without the heavy cost of full segmentation.
Y‑Tech extended keypoint models to pets, creating cat‑face and dog‑face detectors that enable playful pet‑related effects.
Data and Model
Training uses deep‑learning to compute keypoints per frame. The model must be extremely lightweight to run on a wide range of mobile devices with minimal performance impact.
Data collection covers diverse demographics, ages, expressions, angles, lighting, and occlusions, combining web‑sourced images with fine‑grained labeling and supplemental in‑house captures.
Annotation separates facial and body keypoints; facial annotation employs Bézier curves for dense point generation, while body keypoints incorporate public datasets such as COCO, AI Challenger, MPII, and SURREAL.
The network architecture draws from ResNet, MobileNet, and DenseNet, producing separate face detection and single‑person facial keypoint branches, as well as single‑person and multi‑person body keypoint networks, with asynchronous face detection to reduce overhead.
Optimizations address challenges like non‑face detections, large pose angles, occlusions, and varied user postures, greatly improving stability and robustness.
Future Outlook
Future work aims to further shrink runtime and resource usage while delivering denser, more stable keypoints to enhance the expressiveness and variety of effect filters.
Kuaishou Large Model
Official Kuaishou Account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.