How Kuaishou’s Y‑Tech Advances Monocular Depth Estimation for Mobile AR
This article reviews Kuashou Y‑Tech’s ECCV‑2020 paper on monocular depth estimation, detailing its novel GCB‑SAB network, new HC‑Depth dataset, specialized loss functions and edge‑aware training, and demonstrates superior performance on NYUv2, TUM and real‑world mobile AR applications.
Overview
Kuaishou Y‑Tech presents a research paper (ECCV 2020) that proposes a high‑quality monocular depth estimation method, enabling 3D scene understanding on mobile devices. The method powers new experiences such as 3D Photo and mixed reality without requiring special hardware.
Challenges in Monocular Depth Estimation
Estimating depth from a single image faces difficulties such as poor lighting, moving subjects, sky regions, false edges, and camera motion. Existing methods treat depth prediction as pixel‑wise classification or regression, ignoring global structural relationships, which leads to layout errors and blurred edges.
Network Architecture
The proposed model follows an encoder‑decoder (U‑shape) design with skip connections. It introduces two novel modules:
Global Context Block (GCB) : recalibrates channel features by embedding global semantic context.
Spatial Attention Block (SAB) : a spatial attention mechanism that guides feature selection at multiple scales.
Low‑resolution SAB features provide global layout cues, while high‑resolution SAB features emphasize fine details. The fused multi‑scale features are up‑sampled to produce the final depth map.
Spatial Attention Block Details
SAB uses a 1×1 convolution to squeeze concatenated features, aggregates spatial context, and generates an attention map that encodes depth information for every pixel. The attention map is multiplied element‑wise with low‑level features before fusion, allowing the network to re‑calibrate GCB‑enhanced semantic features with spatially aware weights.
Training Losses
The loss function combines four components:
Berhu loss (open‑source)
Scale‑invariant gradient loss
Normal loss
Global Focal Relative Loss (GFRL) – a novel relative loss that incorporates focal loss weighting to emphasize hard pixel pairs.
GFRL samples one pixel from each 16×16 block and compares it with all other pixels in the same image. The weighting factor reduces the influence of easy pairs and focuses training on incorrectly ordered depth relationships.
Edge‑Aware Consistency
To improve depth discontinuities, the method applies an edge‑aware strategy: Canny edges are extracted from the predicted depth map, dilated to form boundary regions, and a higher training weight is assigned to these regions, encouraging sharper depth boundaries.
Multi‑Dataset Incremental Training
The authors train on multiple datasets using an incremental mixing strategy. After converging on a dataset with a similar distribution, harder datasets are added one by one, with a balanced sampler ensuring equitable batch composition. This accelerates convergence and improves generalization.
Results and Comparisons
On the NYUv2 benchmark, the proposed method outperforms state‑of‑the‑art approaches in both quantitative metrics and visual quality. Similar superiority is observed on the TUM dataset (unseen scenes) and on the newly collected HC‑Depth dataset, which contains six challenging scene categories.
Real‑World Applications at Kuaishou
The depth estimation technology powers several mobile features:
Mixed Reality (MR) : Combines monocular depth with SLAM/VIO to enable real‑time occlusion, virtual lighting, and physical collisions on phones.
3D Photo : Generates immersive 3D effects from a single image using dense reconstruction, portrait segmentation, and background inpainting.
Depth‑of‑Field Blur : Uses depth maps and portrait segmentation to simulate large‑aperture bokeh on mobile cameras.
All models run on‑device via the Y‑Tech YCNN inference engine, ensuring broad device compatibility.
Y‑Tech Team Introduction
Y‑Tech is Kuaishou’s AI research group focusing on computer vision, graphics, machine learning, and AR/VR. The team operates in Beijing, Shenzhen, Hangzhou, Seattle, and Palo Alto, and welcomes collaborations via [email protected].
Kuaishou Large Model
Official Kuaishou Account
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.