Artificial Intelligence 13 min read

How SuperEdit Boosts Instruction-Based Image Editing with Rectified Supervision

SuperEdit introduces rectified instruction generation and contrastive supervision to fix noisy training signals in instruction‑based image editing, achieving up to 9.19% performance gains without extra parameters or pre‑training, as demonstrated on the Real‑Edit benchmark.

AI Frontier Lectures

May 19, 2025

How SuperEdit Boosts Instruction-Based Image Editing with Rectified Supervision

Overview

This article presents SuperEdit , a method that improves instruction‑based image editing by correcting mismatched supervision signals and adding contrastive supervision, leading to significant performance gains on multiple benchmarks.

Problem Statement

Noisy supervision: Existing instruction‑based editing datasets contain misaligned edit instructions and original‑edited image pairs, causing noisy supervision.

Complex scene editing: Models struggle with multi‑object, quantity, position, and relational edits.

Dependency on extra modules: Prior works rely on large visual‑language models (VLMs) or additional pre‑training, increasing computational cost.

Proposed Solution

Rectified Instructions: Use a VLM (e.g., GPT‑4o) to analyze original‑edited image differences and generate more accurate edit instructions guided by diffusion‑model priors.

Contrastive Supervision: Construct positive (correct) and negative (incorrect) instruction pairs and train the editor with a triplet loss to distinguish them.

Methodology

The pipeline builds on the InstructPix2Pix framework. During training, each sample provides the original image, edited image, a rectified instruction, and a negative instruction. The model predicts noise for both instructions and optimizes a combined loss: Loss = DiffusionLoss + λ * TripletLoss Triplet loss pushes the noise predicted from the correct instruction closer to the true noise than that from the incorrect instruction.

Instruction Rectification

Given an image pair, a VLM generates attribute descriptions (layout, shape, color, detail) based on diffusion‑model generation stages, then synthesizes a concise instruction limited to 77 CLIP tokens.

Contrastive Instruction Construction

For each rectified instruction, a single attribute (e.g., quantity or spatial relation) is altered to create a negative instruction, ensuring the textual embedding remains similar while the resulting edit differs.

Experiments

Dataset

40,000 training pairs were assembled from InstructPix2Pix, MagicBrush, and Seed‑Data‑Edit, balancing diverse edit types.

Evaluation

Real‑Edit benchmark was used with automatic GPT‑4o scoring and human evaluation, measuring Following (instruction adherence), Preserving (non‑edited region fidelity), and Quality (overall aesthetics).

Results

SuperEdit outperforms SmartEdit (which uses a 13B VLM) by 9.19% on Following, 7% on Preserving, and 11% on Quality.

Human evaluators confirm a 10.8% overall score improvement.

Ablation studies show rectified instructions alone boost performance, and adding contrastive supervision yields further gains.

Scaling data from 5k to 40k samples continuously improves all metrics, indicating no saturation.

Conclusion

By focusing on higher‑quality supervision rather than architectural changes, SuperEdit demonstrates that rectified instructions and contrastive learning can substantially enhance instruction‑based image editing, even with reduced data and model size.

References

[1] SuperEdit: Rectifying and Facilitating Supervision for Instruction‑Based Image Editing

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Diffusion Models visual language model image editing supervision

Written by

AI Frontier Lectures

Leading AI knowledge platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.