Artificial Intelligence 6 min read

ICRDrag: The First In‑Context Region Drag Model for Precise, Controllable Image Editing

ICRDrag, presented at ECCV 2026, introduces an in‑context region‑dragging framework that uses mask‑based attention and bidirectional source‑target constraints to achieve precise, natural image edits while overcoming the deformation and boundary issues of earlier point‑ and region‑drag methods.

Machine Heart

Jul 4, 2026

ICRDrag: The First In‑Context Region Drag Model for Precise, Controllable Image Editing

ICRDrag (In‑Context Region‑based Drag) is a context‑aware region‑dragging model that lets users select a source region and a target region with masks, then moves, scales, or deforms the source region while preserving surrounding details. The demo shows a source image with a blue mask (region to move) and a red mask (target location); dragging the source to the target moves the object, keeps ancillary parts (e.g., mouth and chin) consistent, and minimizes unnecessary changes. The online demo supports up to five source‑target pairs and allows adding anchor masks to lock unaffected areas.

Technical contributions

Context learning framework: Built on DiT, the model receives the original image, source mask, and target mask in a single forward pass and directly outputs the edited image.

Image‑mask attention consistency: The attention map of the generated image must align with the spatial distribution of the target mask, ensuring strict adherence to the defined region.

Source‑target bidirectional attention: The target region attends to the corresponding source region and vice‑versa, establishing a clear correspondence between pre‑ and post‑edit objects.

Separate LoRA modules for image and mask: Independent LoRA adapters are trained for each modality because images contain rich texture while masks encode only shape.

Two‑stage progressive training: Stage 1 uses complete semantic masks to teach basic region‑transformation logic; Stage 2 introduces randomly expanded, coarse masks to simulate hand‑drawn selections, dramatically improving tolerance to imperfect user input.

Dataset and evaluation

To train ICRDrag, the authors constructed the PRD (Paired Region Dataset) from the million‑scale video collection OpenVid, yielding 287,000 triplets of original image + source mask + target image + target mask . For evaluation, PRDBench provides 1,000 manually verified high‑quality samples with masks and keypoints, enabling fair comparison between point‑drag and region‑drag models.

Resources

Paper: https://arxiv.org/pdf/2606.25907

GitHub: https://github.com/bcmi/ICRDrag-Region-Drag-Editing

Demo: https://drag.ustcnewly.com/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

computer vision image editing DiT ICRDrag mask attention region drag

Written by

Machine Heart

Professional AI media and industry service platform

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.