Interview with Alibaba Senior Algorithm Expert Ren Haibing on Intelligent Video Matting Technology
Ren Haibing, Alibaba’s senior algorithm expert, explains how deep‑learning AI matting—combining salient object detection, semantic and instance segmentation in a two‑stage hard‑then‑soft pipeline—replaces traditional methods, achieves hair‑level detail and temporal consistency, scores 84.3% on Cityscapes, powers large‑scale video‑person extraction, and relies on human review to meet growing industry demand.
Alibaba senior algorithm expert Ren Haibing discusses the current state of image and video matting in film and TV production, pointing out the prevalence of low‑quality cut‑out effects that damage viewer experience.
He notes that traditional matting algorithms such as KNN matting, closed‑form matting and Bayesian matting are being replaced by deep‑learning‑based AI matting, which provides higher precision and can be accelerated on GPUs.
Ren explains that modern AI matting pipelines often combine multiple techniques, including salient object detection, semantic segmentation and instance segmentation, to achieve better results.
For video matting, the main challenge is temporal consistency; frame‑by‑frame results may look good individually but exhibit jitter when played together. Alibaba’s solution adds video‑level object segmentation to ensure smooth, stable edges across frames.
The company adopts a two‑stage approach: a hard‑segmentation stage followed by a soft‑segmentation stage that refines edges to achieve “hair‑level” detail. High‑level semantic features ensure overall object integrity, while low‑level features preserve fine details.
Ren’s team leverages state‑of‑the‑art segmentation networks such as HRNet and Deeplab V3+. Their solution achieved 84.3% accuracy on the Cityscapes test set, demonstrating strong performance on complex scenes.
Alibaba Entertainment has built the largest video‑person segmentation dataset in the industry and uses the technology for customized media scenarios, such as extracting characters from variety shows for secondary content creation.
In production, AI matting is combined with human review: fully automatic results are passed to manual correction when needed, enabling large‑scale efficiency gains.
Ren also remarks that while video matting research has seen limited academic breakthroughs recently, growing industrial demand is expected to drive further advances, especially in video matting (video matting) algorithms.
Youku Technology
Discover top-tier entertainment technology here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.