Training-Free Universal Virtual Try-On: OmniVTON’s Multi-Person Breakthrough
OmniVTON introduces a training‑free universal virtual try‑on framework that decouples garment texture and human pose, achieving high‑fidelity results across both in‑shop and in‑the‑wild scenarios, and uniquely supporting multi‑person virtual dressing, as demonstrated by extensive quantitative and qualitative experiments.
Paper Information
Title: OmniVTON: Training-Free Universal Virtual Try-On
Authors: Zhaotong Yang, Yuhui Li, Shengfeng He, Xinzhe Li, Yangyang Xu, Junyu Dong, Yong Du
Institutions: Ocean University of China, Singapore Management University, Harbin Institute of Technology (Shenzhen)
Paper URL: https://arxiv.org/pdf/2507.15037v1
Project Home: https://github.com/Jerome-Young/OmniVTON
Conference: ICCV 2025
Background and Motivation
Virtual try‑on (VTON) aims to dress a target person with a garment image, enabling realistic online shopping experiences. Existing VTON methods are either supervised in‑shop approaches that require paired model‑garment data and struggle to generalize, or unsupervised in‑the‑wild approaches that are more adaptable but still lack universality. Both categories need scene‑specific training, which hinders large‑scale deployment. OmniVTON addresses this limitation by providing a training‑free, universal VTON framework that works for both in‑shop and in‑the‑wild scenarios.
Core Method
OmniVTON adopts a two‑step, training‑free pipeline that leverages a pre‑trained diffusion model while fully decoupling garment processing from pose processing.
Step 1: Structured Garment Morphing (SGM) – Preserving Texture Details
Generate pseudo‑person image: For a garment‑only image, a semantic‑guided model synthesizes a pseudo model wearing the garment.
Multi‑part semantic correspondence: Using skeletal keypoints and semantic segmentation, fine‑grained region‑level correspondences (e.g., torso, left upper arm) are established between the pseudo model and the target person.
Local dynamic transformation: Each garment region undergoes an independent geometric transformation to align precisely with the corresponding body part, producing a structurally coherent warped garment.
Step 2: Pose Injection and Boundary Stitching – Ensuring Pose Consistency
Spectral Pose Injection (SPI): Traditional DDIM inversion preserves pose but introduces texture noise. SPI analyses the noise spectrum, retains low‑frequency components that encode pose contours, and replaces high‑frequency components with random noise, preserving pose while allowing new texture synthesis.
Continuous Boundary Stitching (CBS): To remove discontinuities at garment boundaries after SGM, CBS performs bidirectional semantic feature interaction in attention layers during image restoration, eliminating hard edges and producing visually realistic results.
By separating garment texture handling (SGM) from pose handling (SPI), OmniVTON avoids the bias of diffusion models when conditioned on multiple factors simultaneously.
Experiments and Analysis
OmniVTON was evaluated on VITON‑HD, DressCode, and StreetTryOn datasets. Quantitatively, it outperformed state‑of‑the‑art methods on FID, SSIM, and LPIPS in both paired and unpaired test settings. On DressCode, it achieved the best scores across all garment types.
In the challenging StreetTryOn benchmark, OmniVTON attained top performance in all four cross‑scene settings (shop‑to‑street, model‑to‑model, model‑to‑street, street‑to‑street), demonstrating strong generalization.
Qualitative comparisons show that OmniVTON generates highly realistic, detail‑rich results with accurate pose for tops, bottoms, and dresses.
Ablation studies confirm the effectiveness of the three core modules (SGM, SPI, CBS); removing any module leads to noticeable quality degradation.
Contributions and Impact
First training‑free universal VTON framework that unifies in‑shop and in‑the‑wild scenarios.
Introduced Structured Garment Morphing (SGM), Spectral Pose Injection (SPI), and Continuous Boundary Stitching (CBS) to decouple texture and pose, achieving high fidelity and consistency.
Enabled multi‑person virtual try‑on, expanding applications to family dressing, team uniform design, and related use cases.
Open‑sourced implementation to support further research.
Limitations include difficulty with extremely crowded scenes or very small target body regions, but the method represents a significant step toward robust, universal virtual try‑on technology.
Code example
收
藏
,
分
享
、
在
看
,
给
个
三
连
击呗!Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
