AIWalker
May 11, 2025 · Artificial Intelligence
Unified Multimodal Understanding and Generation: A 30K‑Word Survey of Recent Advances
This comprehensive survey reviews the rapid progress of multimodal understanding and text‑to‑image generation models, categorises existing unified architectures into diffusion‑based, autoregressive, and hybrid paradigms, analyses their tokenisation strategies, datasets and benchmarks, and highlights current challenges and future research directions.
DatasetsDiffusion ModelsSurvey
0 likes · 64 min read
