MagicColor: First Multi‑Instance AI Sketch‑Coloring System for Professional‑Grade Comics

MagicColor introduces a novel multi‑instance sketch‑coloring framework that uses a two‑stage self‑play training strategy, instance guidance, and edge‑aware pixel‑level color matching to automatically produce high‑quality, consistent colors for multiple line‑art instances, outperforming prior GAN and diffusion‑based methods.

AIWalker
AIWalker
AIWalker
MagicColor: First Multi‑Instance AI Sketch‑Coloring System for Professional‑Grade Comics

Problem Definition

Multi‑instance sketch colorization with instance‑level control: given a line‑art image L, a set of reference images R_i and corresponding binary masks M_i, generate a colored output I such that each masked region matches the color palette of its paired reference.

Method Overview

MagicColor builds on a pretrained Stable Diffusion 1.5 UNet and introduces three mechanisms that enable a single forward pass to color multiple instances.

Two‑stage self‑play training : Stage 1 learns single‑reference coloring; Stage 2 adds multiple references using SAM‑extracted instances and random augmentations, mitigating the scarcity of paired multi‑instance data.

Instance Guidance Module : For each mask, ROI features are extracted from DINOv2 feature maps, 10 % of spatial tokens are randomly dropped and replaced with the global DINO embedding, then reshaped into a latent control tensor injected into the diffusion process.

Edge‑aware color matching loss : Combines a VGG‑based perceptual loss, a re‑weighted edge loss that emphasizes high‑frequency boundaries, and a cosine‑similarity based pixel‑wise color matching between reference and source features.

Architecture Details

Backbone: Stable Diffusion 1.5 UNet. A Reference Net encodes each reference image with DINOv2 followed by a small feed‑forward projector. Sketch Guider and Instance Guider are implemented as ControlNet adapters initialized from public weights. The Instance Encoder shares the DINOv2 backbone.

Instance Control Module

For each instance mask M_i, compute its bounding box, sample a dense grid of DINOv2 features inside the box (ROI‑Align‑like), randomly drop 10 % of tokens, replace them with the global DINO embedding, and reshape the result to a latent tensor C_i (shape C×H×W). C_i is concatenated with the diffusion timestep embedding and added to the UNet conditioning, providing instance‑level guidance.

Edge Loss and Color Matching

Standard diffusion training uses pixel‑wise MSE. MagicColor adds:

Perceptual loss on VGG‑19 conv layers.

Edge loss that weights pixels by an edge map derived from a Sobel filter on the ground‑truth image, encouraging accurate reconstruction of object boundaries.

Color matching loss : For each pixel, compute cosine similarity between dense DINO features of the reference and source images; a nearest‑neighbor search yields a correspondence map that enforces color consistency.

Training Procedure

Datasets: anime video frames from Sakuga and ATD‑12K plus manually curated image pairs. 1 000 pairs are held out for testing; the remainder form the training set. Images are resized to 512 px (aspect ratio preserved). Training runs for 100 k optimization steps with batch size 1, learning rate 2e‑5, on two NVIDIA A800 80 GB GPUs for ~7 days. Data augmentation includes random horizontal flip and brightness jitter.

Experimental Results

Qualitative Comparison

Compared with GAN‑based RSIC and SGA and diffusion‑based AnimeDiffusion, ColorizeDiffusion, MangaNinja, MagicColor better preserves instance‑level color correspondence, especially for small details, thanks to explicit instance guidance and edge‑aware loss.

Quantitative Evaluation

On the held‑out test set MagicColor achieves lower FID, higher PSNR and SSIM, and lower LPIPS than all baselines, indicating superior visual similarity and structural fidelity (see Table 1 in the paper).

Ablation Study

Edge loss removed → degraded edge preservation, visible color bleed at object boundaries.

Color matching removed → washed‑out colors and loss of reference palette.

Instance guider removed → failure to transfer instance‑level colors, increased noise.

User Study

Sixteen participants used a demo UI to color sketches. Across four criteria (image quality, usability, reference similarity, result relevance) MagicColor received the highest average scores; average task time exceeded 20 minutes per sketch.

Limitations

Extreme occlusions can hinder accurate color transfer.

Heavy overlap among objects reduces semantic awareness.

Memory and compute scale roughly linearly with the number of reference instances.

Conclusion

MagicColor shows that a diffusion‑based framework equipped with a two‑stage self‑play curriculum, instance guidance, and edge‑aware color matching can achieve high‑fidelity multi‑instance sketch colorization with fine‑grained control. The code and model weights will be released to foster further research.

References

[1] MagicColor: Multi‑Instance Sketch Colorization, arXiv:2503.16948 https://arxiv.org/pdf/2503.16948

computer visionAIDiffusion ModelsMulti-InstanceSketch Colorization
AIWalker
Written by

AIWalker

Focused on computer vision, image processing, color science, and AI algorithms; sharing hardcore tech, engineering practice, and deep insights as a diligent AI technology practitioner.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.