Artificial Intelligence 16 min read

How Kuaishou’s ‘All‑Things AR’ Turns Real Objects into Interactive 3D Characters

‘All‑Things AR’ (万物AR) is a Kuaishou Y‑tech solution that lets users capture any real‑world object with a phone, automatically segments it using a custom AI model, and renders an animated 3D avatar via a lightweight SLAM‑based pipeline, enabling low‑cost, high‑quality AR experiences.

Kuaishou Large Model
Kuaishou Large Model
Kuaishou Large Model
How Kuaishou’s ‘All‑Things AR’ Turns Real Objects into Interactive 3D Characters

Background

In August 2020 a Japanese app called RakugakiAR allowed users to turn 2D doodles into 3D animated AR characters. The app quickly went viral, highlighting the appeal of turning real‑world sketches into moving avatars.

Concept of “All‑Things AR”

Inspired by RakugakiAR, Kuaishou’s Y‑tech team created “All‑Things AR”, a feature that lets users point a phone at any object, automatically segment the object, and render a lively 3D avatar without requiring the user to draw anything first.

Overall Pipeline

The workflow consists of generic object detection, object‑level segmentation (the “万物分割” step), passing the mask texture to an effect SDK, rendering the AR character, compositing with the background, and finally delivering the interactive AR experience.

Model Training and Data Preparation

To support segmentation of any object, the team collected large‑scale open‑source datasets and internal data, used a server‑side SOTA model to generate pseudo‑labels for new categories, and performed iterative manual annotation on hard cases.

Data augmentation (boundary smoothing, hole filling, random occlusion, lighting changes) was applied to improve robustness.

Segmentation Model Optimization

Two specialized models were trained: one for faces and one for generic objects, reducing complexity. A cascade architecture with multi‑task learning (boundary detection, classification) was introduced to suppress background errors.

Loss functions were enhanced with OHEM for hard‑example mining and contrastive learning to stabilize predictions.

Engineering Integration

The AI pipeline (Ykit) detects objects each frame and, when segmentation is triggered, returns both bounding boxes and masks. Separate handling for faces and generic objects ensures optimal performance on mobile devices.

The effect engine (FaceMagic) receives the mask texture via SDK calls, then uses a custom rendering engine (SKwai) to display the AR avatar.

AR Effect Development

Designers created three themed avatars (band, Ramadan, Olympics) with 3D models, skeletal rigs, and particle effects. Adaptive scaling ensures the avatar fits objects of varying shapes.

Visual polish includes color correction, fake shadows, and particle systems for dynamic effects.

Camera Localization (SLAM)

A lightweight SLAM system estimates the phone’s pose and reconstructs a sparse 3D map, enabling stable placement of virtual objects. For simple AR scenes, a plane‑assumption SLAM variant runs at >15 fps on low‑end devices.

Conclusion

Through coordinated advances in AI segmentation, real‑time rendering, and efficient SLAM, the All‑Things AR feature was delivered at scale on Kuaishou, driving user engagement and opening new business opportunities.

Computer Visionmobile AIARSLAMreal-time segmentation
Kuaishou Large Model
Written by

Kuaishou Large Model

Official Kuaishou Account

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.