Amap Tech
Amap Tech
Jul 11, 2025 · Artificial Intelligence

Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding

The USP framework introduces masked latent modeling within a VAE space to pre‑train ViT encoders, enabling seamless weight transfer to both image classification, segmentation, and diffusion‑based generation tasks, dramatically speeding up DiT and SiT models while preserving strong visual representations.

Diffusion ModelsImage GenerationVAE
0 likes · 13 min read
Unified Self‑Supervised Pretraining Accelerates Image Generation and Improves Understanding
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 11, 2024 · Artificial Intelligence

How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation

This article reviews recent advances in applying large language models to computer vision, covering background challenges, unified multimodal modeling, the PixelLM architecture for pixel‑level understanding and generation, and new approaches to image and video creation such as StoryDiffusion, while outlining future research directions.

PixelLMStoryDiffusioncomputer vision
0 likes · 22 min read
How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation