Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

The USP framework introduces masked latent modeling within a VAE space to pre‑train ViT encoders, enabling seamless weight transfer to both image classification, segmentation, and diffusion‑based generation tasks, dramatically speeding up DiT and SiT models while preserving strong visual representations.

PretrainingVAEViT³

0 likes · 13 min read

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

Volcano Engine Developer Services

Sep 11, 2024 · Artificial Intelligence

How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation

This article reviews recent advances in applying large language models to computer vision, covering background challenges, unified multimodal modeling, the PixelLM architecture for pixel‑level understanding and generation, and new approaches to image and video creation such as StoryDiffusion, while outlining future research directions.

PixelLMStoryDiffusioncomputer vision

0 likes · 22 min read

How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation