Tagged articles

Speech LLM

3 articles · Page 1 of 1
Weekly Large Model Application
Weekly Large Model Application
May 29, 2026 · Artificial Intelligence

From Direct Transcription to Reasoning ASR and Parallel Decoding: CoT‑ASR vs Whisfusion

ASR is shifting from direct verbatim transcription to two new paradigms—Chain‑of‑Thought reasoning (CoT‑ASR) that cuts WER and entity error rates, and diffusion‑based parallel decoding (Whisfusion) that slashes latency by over eight times—offering complementary routes for smarter, faster speech recognition.

ASRChain-of-ThoughtCoT-ASR
0 likes · 12 min read
From Direct Transcription to Reasoning ASR and Parallel Decoding: CoT‑ASR vs Whisfusion
Weekly Large Model Application
Weekly Large Model Application
Mar 13, 2026 · Artificial Intelligence

Speech Large Models: Why End-to-End Architecture Beats Traditional ASR‑LLM‑TTS Pipelines

The article defines true speech large models as native end‑to‑end systems that directly map audio to audio, compares them with traditional cascade ASR‑LLM‑TTS pipelines across architecture, error control, latency, paralinguistic perception, long‑context handling and deployment, and surveys the leading open‑source and commercial speech LLMs released in March 2026 with a quick selection guide.

AIASREnd-to-End
0 likes · 11 min read
Speech Large Models: Why End-to-End Architecture Beats Traditional ASR‑LLM‑TTS Pipelines