Weekly Large Model Application
Mar 30, 2026 · Artificial Intelligence
Inside Kimi-Audio: A Unified Large Audio Model Covering ASR, AQA, TTS and More
Kimi-Audio, a general‑purpose audio foundation model from Moonshot AI, integrates ASR, audio QA, automatic audio captioning, emotion classification and end‑to‑end speech dialogue within a single framework, detailing its mixed‑audio input, MiMo‑Transformer core, efficient synthesis pipeline, architectural strengths, limitations, and suitable application scenarios.
ASRAudio LLMBigVGAN
0 likes · 9 min read
