Volcano Engine Developer Services
Sep 11, 2024 · Artificial Intelligence
How Large Language Models are Transforming Computer Vision: From Image Understanding to Video Generation
This article reviews recent advances in applying large language models to computer vision, covering background challenges, unified multimodal modeling, the PixelLM architecture for pixel‑level understanding and generation, and new approaches to image and video creation such as StoryDiffusion, while outlining future research directions.
Computer VisionPixelLMStoryDiffusion
0 likes · 22 min read
