Artificial Intelligence 9 min read

Training-Free Universal Virtual Try-On: OmniVTON’s Multi-Person Breakthrough

OmniVTON introduces a training‑free universal virtual try‑on framework that decouples garment texture and human pose, achieving high‑fidelity results across both in‑shop and in‑the‑wild scenarios, and uniquely supporting multi‑person virtual dressing, as demonstrated by extensive quantitative and qualitative experiments.

AI Frontier Lectures

Jul 26, 2025

Training-Free Universal Virtual Try-On: OmniVTON’s Multi-Person Breakthrough

Paper Information

Title: OmniVTON: Training-Free Universal Virtual Try-On

Authors: Zhaotong Yang, Yuhui Li, Shengfeng He, Xinzhe Li, Yangyang Xu, Junyu Dong, Yong Du

Institutions: Ocean University of China, Singapore Management University, Harbin Institute of Technology (Shenzhen)

Paper URL: https://arxiv.org/pdf/2507.15037v1

Project Home: https://github.com/Jerome-Young/OmniVTON

Conference: ICCV 2025

Background and Motivation

Virtual try‑on (VTON) aims to dress a target person with a garment image, enabling realistic online shopping experiences. Existing VTON methods are either supervised in‑shop approaches that require paired model‑garment data and struggle to generalize, or unsupervised in‑the‑wild approaches that are more adaptable but still lack universality. Both categories need scene‑specific training, which hinders large‑scale deployment. OmniVTON addresses this limitation by providing a training‑free, universal VTON framework that works for both in‑shop and in‑the‑wild scenarios.

Core Method

OmniVTON adopts a two‑step, training‑free pipeline that leverages a pre‑trained diffusion model while fully decoupling garment processing from pose processing.

Step 1: Structured Garment Morphing (SGM) – Preserving Texture Details

Generate pseudo‑person image: For a garment‑only image, a semantic‑guided model synthesizes a pseudo model wearing the garment.

Multi‑part semantic correspondence: Using skeletal keypoints and semantic segmentation, fine‑grained region‑level correspondences (e.g., torso, left upper arm) are established between the pseudo model and the target person.

Local dynamic transformation: Each garment region undergoes an independent geometric transformation to align precisely with the corresponding body part, producing a structurally coherent warped garment.

Step 2: Pose Injection and Boundary Stitching – Ensuring Pose Consistency

Spectral Pose Injection (SPI): Traditional DDIM inversion preserves pose but introduces texture noise. SPI analyses the noise spectrum, retains low‑frequency components that encode pose contours, and replaces high‑frequency components with random noise, preserving pose while allowing new texture synthesis.

Continuous Boundary Stitching (CBS): To remove discontinuities at garment boundaries after SGM, CBS performs bidirectional semantic feature interaction in attention layers during image restoration, eliminating hard edges and producing visually realistic results.

By separating garment texture handling (SGM) from pose handling (SPI), OmniVTON avoids the bias of diffusion models when conditioned on multiple factors simultaneously.

Experiments and Analysis

OmniVTON was evaluated on VITON‑HD, DressCode, and StreetTryOn datasets. Quantitatively, it outperformed state‑of‑the‑art methods on FID, SSIM, and LPIPS in both paired and unpaired test settings. On DressCode, it achieved the best scores across all garment types.

In the challenging StreetTryOn benchmark, OmniVTON attained top performance in all four cross‑scene settings (shop‑to‑street, model‑to‑model, model‑to‑street, street‑to‑street), demonstrating strong generalization.

Qualitative comparisons show that OmniVTON generates highly realistic, detail‑rich results with accurate pose for tops, bottoms, and dresses.

Ablation studies confirm the effectiveness of the three core modules (SGM, SPI, CBS); removing any module leads to noticeable quality degradation.

Contributions and Impact

First training‑free universal VTON framework that unifies in‑shop and in‑the‑wild scenarios.

Introduced Structured Garment Morphing (SGM), Spectral Pose Injection (SPI), and Continuous Boundary Stitching (CBS) to decouple texture and pose, achieving high fidelity and consistency.

Enabled multi‑person virtual try‑on, expanding applications to family dressing, team uniform design, and related use cases.

Open‑sourced implementation to support further research.

Limitations include difficulty with extremely crowded scenes or very small target body regions, but the method represents a significant step toward robust, universal virtual try‑on technology.

Code example

收
藏
，
分
享
、
在
看
，
给
个
三
连
击呗！

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Artificial Intelligence computer vision virtual try-on Training-Free Multi-Person

Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.