Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 11, 2024 · Artificial Intelligence

How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation

This article reviews recent advances in applying large language models to computer vision, covering background challenges, unified multimodal modeling, the PixelLM architecture for pixel‑level understanding and generation, and new approaches to image and video creation such as StoryDiffusion, while outlining future research directions.

Computer VisionPixelLMStoryDiffusion
0 likes · 22 min read
How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation