Artificial Intelligence 6 min read

ARShoe: Real-Time Augmented Reality Shoe Try-On System on Smartphones

The paper presents ARShoe, the first practical real‑time augmented reality shoe try‑on system for smartphones, detailing its multi‑branch neural network, foot pose estimation, rendering pipeline, a newly built foot dataset, and extensive experiments demonstrating high accuracy and over 30 FPS performance on multiple devices.

JD Retail Technology
JD Retail Technology
JD Retail Technology
ARShoe: Real-Time Augmented Reality Shoe Try-On System on Smartphones

Recently, a paper from JD Retail's Shared Technology Department was accepted at the 29th ACM International Conference on Multimedia, introducing ARShoe, the first academic study on augmented reality shoe try‑on technology.

ARShoe is a real‑time AR shoe try‑on system designed for smartphones. Its workflow includes a multi‑branch encoder‑decoder network that jointly learns foot keypoint heatmaps, part affinity fields (PAFs), and leg/foot segmentation, while keeping computational cost low.

The system detects eight foot keypoints, groups them via PAFs, and uses the camera intrinsics with a PnP algorithm to recover a six‑degree‑of‑freedom foot pose, enabling correct placement of a 3D shoe model.

Segmentation results are used to locate the shoe contour and render the 3D model with realistic occlusion and scaling. A novel rendering‑stabilization module leverages matching points between consecutive frames to smooth the shoe’s motion and eliminate jitter.

To support research, the authors constructed a large‑scale foot dataset containing annotations for keypoints, limb connections, and segmentation masks, as well as pose transformation matrices for 3D shoe overlay.

Experimental results show that ARShoe has the lowest FLOPs (0.984 G) and parameter count (1.292 M) among comparable methods, while achieving over 30 FPS on four common smartphones. It attains foot‑keypoint detection speed above 80 FPS with a mean average precision only 0.041 lower than the best baseline, and segmentation mIOU of 0.901 and AP of 0.927, demonstrating a strong balance between speed and accuracy.

Visualizations confirm that ARShoe handles diverse foot poses and produces realistic virtual try‑on effects in real‑world scenes.

The authors believe that the ARShoe system and the foot benchmark dataset will further advance the field of virtual try‑on technologies.

mobileReal-timecomputer visionARaugmented realityshoe try-on
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.