How SwiftTailor Accelerates Realistic 3D Garment Generation

SwiftTailor introduces a two‑stage, geometry‑centric framework that unifies pattern inference and mesh synthesis, dramatically cutting inference time to seconds while achieving state‑of‑the‑art accuracy and visual realism on the Multimodal GarmentCodeData benchmark for digital fashion.

Data Party THU
Data Party THU
Data Party THU
How SwiftTailor Accelerates Realistic 3D Garment Generation

Problem Statement

Generating realistic 3D garments efficiently remains a major challenge in computer vision and digital fashion. Conventional pipelines use large vision‑language models to produce serialized 2D sewing patterns, which are then converted into simulatable 3D meshes by frameworks such as GarmentCode. Although these pipelines achieve high visual quality, inference typically takes 30 seconds to 1 minute per garment because of costly pattern generation and physics‑based simulation.

SwiftTailor Overview

SwiftTailor is a two‑stage framework that unifies pattern inference and geometry synthesis using a compact geometric image representation. The design eliminates the need for separate simulation steps and reduces inference time to a few seconds while preserving state‑of‑the‑art accuracy.

Stage 1 – PatternMaker

PatternMaker is a lightweight vision‑language model that accepts multiple input modalities (e.g., text description, reference images, or sketches) and predicts a set of 2D sewing patterns. The model is trained on the Multimodal GarmentCodeData dataset, which provides paired multimodal inputs and ground‑truth patterns. Because the network is deliberately small (e.g., ViT‑B/16 backbone with 12M parameters), inference per garment is under 1 second .

Stage 2 – GarmentSewer

GarmentSewer is a dense‑prediction Transformer that converts the predicted patterns into a unified garment geometry image . This image encodes the 3D surface of every garment piece in a shared UV space, enabling parallel processing of all panels. The transformer uses a standard encoder‑decoder architecture with multi‑scale attention and outputs a per‑pixel 3‑D coordinate map.

Mesh Reconstruction

The geometry image is transformed back to a 3D mesh through an inverse‑mapping pipeline:

Remeshing the UV map into a regular triangulation.

Dynamic stitching that merges adjacent panels along seam lines without invoking physics‑based simulation.

This process produces a watertight garment mesh in under 0.5 seconds , dramatically faster than traditional simulation‑driven pipelines.

Experimental Evaluation

Experiments on the Multimodal GarmentCodeData benchmark compare SwiftTailor against prior methods that rely on large vision‑language models and GarmentCode simulation. Key results:

Average inference time reduced from ~45 seconds to ~3 seconds per garment.

Quantitative accuracy (e.g., mean vertex error) improves by 12 % relative to baselines.

Visual fidelity measured by FID on rendered images matches or exceeds state‑of‑the‑art scores.

The ablation study confirms that the geometric image representation and the dense‑prediction transformer are the primary contributors to speed‑up and accuracy gains.

Key Contributions

Compact geometric image format that bridges pattern prediction and mesh synthesis.

Two lightweight modules—PatternMaker and GarmentSewer—tailored for fast, multimodal garment generation.

Efficient inverse‑mapping reconstruction that removes the need for costly physical simulation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Computer VisionAIfashion technology3D garment generationSwiftTailor
Data Party THU
Written by

Data Party THU

Official platform of Tsinghua Big Data Research Center, sharing the team's latest research, teaching updates, and big data news.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.