Industry Insights 15 min read

Turning Sketches into Live AR Characters: Kuaishou’s All‑Things‑AR Technical Journey

This article details how Kuaishou transformed a user‑drawn sketch concept into the All‑Things‑AR feature, covering background inspiration, the end‑to‑end pipeline, data collection, mobile‑friendly segmentation model design, model optimizations, engineering integration, SLAM‑based camera localization, and final production results.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
Turning Sketches into Live AR Characters: Kuaishou’s All‑Things‑AR Technical Journey

Background

In August 2020, a Japanese app called “RakugakiAR” allowed users to turn 2D doodles into movable 3D AR characters, quickly becoming a viral hit. Observing its high user‑interaction threshold, Kuaishou explored a more accessible approach where users simply point the camera at any real‑world object to generate an animated avatar, coining the concept “All‑Things‑AR”.

Overall Process

The All‑Things‑AR workflow (see Fig. 6) consists of generic object detection, object‑level segmentation, texture extraction, SDK‑based AR rendering, and final compositing. Accurate segmentation and animation rendering are the most critical steps.

Segmentation Model

To support arbitrary objects, a universal segmentation model was required. Training data had to cover a wide variety of objects, while the model needed to run in real‑time on low‑end mobile devices. The data pipeline involved:

Collecting open‑source and internal image datasets without segmentation labels.

Using a high‑performance server‑side model to generate pseudo‑labels for new object categories.

Iteratively refining the model by analyzing failure cases, manually annotating a small subset, and re‑training.

Data quality issues such as holes, coarse boundaries, and background confusion were mitigated through boundary smoothing, hole filling, and data augmentation (random occlusion, lighting changes).

Model Optimization

Key optimizations for mobile deployment included:

Splitting the task into two specialized models—one for faces and one for generic objects—to reduce complexity.

Employing multi‑task learning with auxiliary branches for boundary detection and classification, and a cascade architecture to filter out background noise.

Applying OHEM loss for hard‑example mining and contrastive learning to improve mask stability.

These changes reduced false positives/negatives and achieved segmentation quality surpassing competing solutions (see Fig. 9).

Engineering Integration

The pipeline was divided into two engineering components:

Ykit AI Engine : Handles object detection, invokes the appropriate segmentation model, and returns both bounding boxes and masks.

FaceMagic Effect Engine : Receives AI output, runs the AR effect SDK (C++), and uses Lua scripts for interactive logic, allowing rapid feature iteration without SDK redeployment.

Communication between the engines occurs via triggerDetectSegData and refDetectSegInfo, passing texture IDs and dimensions to the rendering engine SKwai.

Effect Development

Designers created three themed avatars—Band, Ramadan, and Olympic—each with distinct 3D models, colors, and particle effects. The system adapts the avatar’s proportions to the detected object’s aspect ratio, ensuring visual coherence (see Figs. 10‑16). Additional visual refinements include color grading, shadow mapping, and a custom particle system.

Camera Localization

To anchor virtual avatars in the real world, Kuaishou built a SLAM system that estimates camera pose and reconstructs a sparse 3D map. For simple AR scenarios, a plane‑assumption SLAM was introduced, using the device’s accelerometer to infer the ground plane normal, dramatically reducing computational load and achieving 15 fps on low‑end devices while still delivering centimeter‑level accuracy.

Conclusion

Through coordinated advances in data preparation, lightweight segmentation, modular engineering, and efficient SLAM, Kuaishou successfully delivered the All‑Things‑AR experience across multiple campaigns and the 2024 Olympic special effects, demonstrating a scalable workflow for future interactive AR products.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Mobile DevelopmentComputer VisionARSLAMSegmentationindustry case study
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.