How Alibaba’s DAMO Lab Revolutionizes Image Cutout with AI‑Powered Matting
Alibaba's DAMO Academy presents an AI‑driven image cutout system that combines filtering, classification, detection, and advanced segmentation to automate high‑precision matting, improve design efficiency, and unlock new commercial opportunities across e‑commerce and media industries.
Since the establishment of Alibaba’s DAMO Academy, the team has been exploring high‑end AI research, leading to a dedicated image cutout (matting) solution for the Alibaba Intelligent Design Lab’s “Luban” product.
Why develop a cutout algorithm?
Designers spend over two hours per portrait for precise cutout, a labor‑intensive task that hampers efficiency. Automating this with AI improves banner, poster, and venue image production, unifies visual style, and boosts conversion rates.
Industry demand and challenges
Image matting is valuable across e‑commerce, entertainment, education, and other verticals. Existing methods struggle with fine hair details and general‑scene robustness, prompting the need for a more generalized and high‑precision approach.
System architecture
The solution consists of four modules: filtering, classification, detection, and segmentation.
Filtering: Removes low‑quality images (dark, overexposed, blurry) using classification models and basic image algorithms.
Classification: Tailors models to product categories (e.g., cosmetics, 3C, toys) and scene types (human, animal) to improve segmentation.
Detection: Crops redundant elements such as logos or text before segmentation for higher accuracy.
Segmentation: Performs a coarse mask followed by a fine mask to achieve hair‑level precision and speed.
Precision improvements
Segmentation remains the weakest link; the team enhanced it by designing a dual‑decoder network that predicts foreground and background probabilities and a regression‑based loss for semi‑transparent regions.
Equation for pixel transparency:
where the two following images represent foreground and background probabilities respectively:
Network design
The segmentation network uses an encoder‑decoder backbone with two decoders that output foreground and background probabilities. For fully opaque or transparent pixels, the network predicts the exact alpha value; for semi‑transparent pixels it predicts upper and lower bounds, guided by a weighted cross‑entropy loss.
The fusion network, composed of several consecutive convolutional layers, predicts the mixing weight for each pixel, focusing training on semi‑transparent regions where gradients are non‑zero.
Productization
The technology powers multiple Alibaba products, including batch white‑background generation, portrait and animal cutout, and upcoming cartoon, fashion, and panoramic cutout features.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
