Artificial Intelligence 10 min read

Kuaishou Y‑Tech’s Real‑Time, High‑Precision Facial & Body Keypoint Detection Explained

Y‑Tech’s in‑house keypoint detection system powers Kuaishou’s beauty and effect filters across live streaming, video creation, and editing by leveraging lightweight deep‑learning models, extensive multi‑scenario data collection, and specialized handling of occlusion, enabling real‑time, robust facial and body landmark tracking on diverse mobile devices.

Kuaishou Large Model
Kuaishou Large Model
Kuaishou Large Model
Kuaishou Y‑Tech’s Real‑Time, High‑Precision Facial & Body Keypoint Detection Explained

Overview

Keypoint detection locates specific human body points (eyes, mouth, shoulders, etc.) and underpins advanced facial and body analysis tasks such as recognition, 3D reconstruction, expression detection, beautification, and body shaping. Accuracy and stability of keypoint detection directly affect downstream task performance.

Business Applications

At Kuaishou, Y‑Tech’s robust, precise, and efficient keypoint detection supports various effect filters (beauty, makeup, shaping, body, and novelty accessories) across live streaming, video capture, and editing, powering products like Kuaishou, Yitiao, Kuaishou Film, and Live Companion.

Effect filters provide one‑click makeup, reduce preparation effort, and increase user engagement, thereby enriching platform content.

The system also outputs occlusion information; a lightweight branch predicts whether keypoints are occluded, allowing the pipeline to suppress makeup on occluded regions without the heavy cost of full segmentation.

Y‑Tech extended keypoint models to pets, creating cat‑face and dog‑face detectors that enable playful pet‑related effects.

Data and Model

Training uses deep‑learning to compute keypoints per frame. The model must be extremely lightweight to run on a wide range of mobile devices with minimal performance impact.

Data collection covers diverse demographics, ages, expressions, angles, lighting, and occlusions, combining web‑sourced images with fine‑grained labeling and supplemental in‑house captures.

Annotation separates facial and body keypoints; facial annotation employs Bézier curves for dense point generation, while body keypoints incorporate public datasets such as COCO, AI Challenger, MPII, and SURREAL.

The network architecture draws from ResNet, MobileNet, and DenseNet, producing separate face detection and single‑person facial keypoint branches, as well as single‑person and multi‑person body keypoint networks, with asynchronous face detection to reduce overhead.

Optimizations address challenges like non‑face detections, large pose angles, occlusions, and varied user postures, greatly improving stability and robustness.

Future Outlook

Future work aims to further shrink runtime and resource usage while delivering denser, more stable keypoints to enhance the expressiveness and variety of effect filters.

computer visiondeep learningmobile AIkeypoint detectionbeauty filters
Kuaishou Large Model
Written by

Kuaishou Large Model

Official Kuaishou Account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.