SizeCube: AI‑Driven Arbitrary‑Size Image and Video Outpainting for Advertising
SizeCube leverages Stable Diffusion‑based diffusion models and a sophisticated pipeline—including quality filtering, feature mining, latent‑space UNet denoising, super‑resolution, and temporal 3D‑U‑Net video processing—to automatically outpaint images and videos to any size, boosting Alibaba advertisers’ creative flexibility, click‑through rates, and asset adaptability across diverse ad placements.
In the era of digital marketing, adapting visual assets to diverse display formats is challenging. Recent advances in generative AI, especially Stable Diffusion, enable high‑quality outpainting that expands images or videos beyond their original borders.
The SizeCube system addresses the limitations of traditional cropping and template‑based resizing by using diffusion models to extend creative assets to any target size while preserving original content.
Technical pipeline : the workflow includes material quality filtering, feature mining (Canny edges, scene segmentation, subject tags), latent‑space UNet denoising for image/video expansion, super‑resolution enhancement, and post‑processing (edge blending, original‑material re‑integration).
Data engineering : a curated dataset of e‑commerce images and videos is annotated with clarity, OCR, motion intensity, face/body detection, aesthetic scores, etc. Masked regions are generated to train the outpainting models.
Image outpainting : leveraging Stable Diffusion inpainting models, the team introduced a padding‑image strategy to provide coherent latent features, reducing unrelated artifacts. Additional tricks such as negative prompts mitigate pseudo‑text generation on logos and promotional copy. A human‑body detector routes body‑containing regions to specialized models, balancing general‑scene and human‑specific generation.
Video outpainting : a 3D‑U‑Net diffusion model (based on Stable Diffusion) with temporal convolutions and modified attention layers processes video frames. Diverse masking patterns simulate arbitrary‑size extensions, and multi‑condition inputs (global frame embeddings, Canny edges) improve coherence. For long videos, a coarse‑to‑fine inference pipeline generates keyframes first, then refines intermediate frames, alleviating error accumulation and reducing latency.
Business impact : SizeCube is deployed in Alibaba’s advertising platforms, allowing merchants to upload an image or video and obtain a one‑click, arbitrary‑size transformation. The system improves native asset adaptation, click‑through rates, and creative flexibility across multiple ad placements.
Conclusion : By integrating AIGC techniques, SizeCube eliminates the trade‑offs of cropping and templating, delivering high‑fidelity, size‑agnostic visual assets. Future work will focus on higher accuracy, efficiency, and broader application scenarios.
Alimama Tech
Official Alimama tech channel, showcasing all of Alimama's technical innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.