Tag

visual-language models

0 views collected around this technical thread.

Ximalaya Technology Team
Ximalaya Technology Team
Oct 10, 2023 · Artificial Intelligence

MiniGPT-5: A Novel Multimodal Generation Model for Coherent Text-Image Synthesis

MiniGPT-5 is a novel multimodal generation model using generative vokens to interleave text and image synthesis, integrating Stable Diffusion and LLMs with a two-stage training that requires no domain-specific annotations, achieving state‑of‑the‑art coherence and quality on benchmarks like CC3M, VIST, and MMDialog.

AI researchMultimodal GenerationStable Diffusion
0 likes · 9 min read
MiniGPT-5: A Novel Multimodal Generation Model for Coherent Text-Image Synthesis