AIWalker
Feb 8, 2025 · Artificial Intelligence
Introducing Ola: A Full‑Modal Language Model from Tsinghua & Tencent that Unifies Image, Video, and Audio Understanding
The article presents Ola, an open‑source full‑modal LLM that uses progressive modality alignment to jointly process text, images, video, and audio, and demonstrates competitive performance across image, video, and audio benchmarks, surpassing many specialized models.
BenchmarkOlalarge language model
0 likes · 22 min read
