One-Click Removal & Seamless Integration: CycleFlow + Diffusion Prior Power OmniPaint
OmniPaint introduces a unified diffusion‑based framework that achieves physically consistent object removal and insertion by leveraging a pre‑trained FLUX‑1 diffusion prior, a progressive CycleFlow training pipeline, and a novel reference‑free CFD metric for high‑fidelity image editing.
Problem
Diffusion‑based generative models have difficulty producing realistic object removal and insertion because physical effects such as shadows and reflections interact in complex ways, and paired training data are scarce.
OmniPaint Framework
OmniPaint treats removal and insertion as inter‑dependent processes within a single diffusion framework built on the FLUX‑1.dev multimodal diffusion Transformer. The model receives an image and a binary mask that defines the edit region. For removal, the mask guides denoising to suppress semantic traces while preserving smooth boundaries. For insertion, the mask and a reference‑object image are encoded, concatenated as conditional tokens, and denoised to generate a context‑aware insertion.
Key Technical Contributions
CycleFlow – an incremental training workflow that enables large‑scale unpaired post‑training, reducing reliance on paired data.
Context‑aware Feature Deviation (CFD) – a reference‑free evaluation metric that penalises hallucinations and measures contextual consistency.
Mask augmentation strategies that improve robustness to noisy or imprecise masks.
Methodology
Training Pipeline
Pre‑repair stage : Fine‑tune the FLUX prior on random masks sampled from LAION using a CFM loss to learn basic inpainting.
Paired warm‑up : Train on 3,000 paired samples that contain realistic shadows, reflections and identity‑preserving insertions.
CycleFlow unpaired refinement : Leverage large segmentation datasets (COCO‑Stuff, HQSeg) without paired masks. Two mappings (removal ↔ insertion) predict target samples, and a cycle‑consistency loss enforces that re‑inserting a removed object recovers its latent representation.
Unprompted Adaptive Control
To avoid ambiguous text prompts, two learnable task‑specific vectors replace textual embeddings. These vectors are trained as LoRA modules while the FLUX backbone is frozen, allowing the model to switch between removal and insertion at inference time.
CFD Metric Details
CFD combines a hallucination penalty and a context‑consistency term. The hallucination penalty uses SAM‑ViT‑H to detect nested or overlapping masks inside the edited region and DINOv2 visual features to weight each mask. The context‑consistency term measures feature deviation between the repaired region and its surrounding background. Lower CFD scores indicate better removal quality.
Experiments
Evaluation Setup
Object removal is benchmarked against MAT, LaMa, SDInpaint, FLUX‑Inpainting, CLIPAway, PowerPaint and FreeCompose on a 300‑image real‑world test set and the RORD dataset (1,000 high‑resolution pairs). Metrics include PSNR, SSIM, FID, CMMD, LPIPS, ReMOVE and CFD. Object insertion is compared with Paint‑by‑Example, ObjectStitch, FreeCompose, AnyDoor and IMPRINT on 565 samples, evaluating identity preservation (CLIP‑I, DINOv2, CUTE, DreamSim) and overall quality (MUSIQ, MANIQA).
Results
OmniPaint consistently achieves the lowest FID, CMMD, LPIPS and CFD scores while maintaining high PSNR, SSIM and ReMOVE, demonstrating superior removal fidelity and minimal hallucinations. For insertion, it leads on all identity metrics and outperforms baselines on MUSIQ and MANIQA, confirming seamless geometric and lighting integration.
Ablation Studies
Varying the cycle‑loss weight shows that moderate values balance effect synthesis and realism; excessive weight introduces unnatural artifacts. Neural Function Evaluation (NFE) analysis reveals diminishing returns beyond NFE = 28, which is adopted as the default.
Conclusion
OmniPaint unifies object‑oriented image editing by combining a diffusion prior, progressive CycleFlow training, and the CFD metric, achieving precise foreground removal and seamless object insertion with preserved geometry and physical effects. Extensive experiments validate its state‑of‑the‑art performance.
Paper: https://arxiv.org/pdf/2503.08677
GitHub repository: https://github.com/yeates/OmniPaint
AIWalker
Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
