Tagged articles
5 articles
Page 1 of 1
Weekly Large Model Application
Weekly Large Model Application
May 5, 2026 · Artificial Intelligence

What Do End‑to‑End Speech Large Models Actually Learn? A Four‑Step Diagram

The article distinguishes two meanings of “end‑to‑end,” then outlines four sequential stages—defining data and scenario, massive pre‑training on audio‑text pairs, task alignment via instruction or supervised fine‑tuning, and optional preference tuning—to guide engineers in building usable speech assistants.

Speech AIaudio dataend-to-end models
0 likes · 6 min read
What Do End‑to‑End Speech Large Models Actually Learn? A Four‑Step Diagram
Zuoyebang Tech Team
Zuoyebang Tech Team
Jun 10, 2022 · Artificial Intelligence

How End-to-End Phoneme Recognition Boosts English Pronunciation Detection

This article examines the challenges of English pronunciation teaching in China and presents a practical end-to-end phoneme‑level mispronunciation detection system that leverages CTC models, attention‑based text fusion, and data augmentation to dramatically reduce false alarms and improve diagnostic accuracy.

AI educationend-to-end modelslanguage learning
0 likes · 9 min read
How End-to-End Phoneme Recognition Boosts English Pronunciation Detection
DataFunTalk
DataFunTalk
Dec 14, 2021 · Artificial Intelligence

Speech Translation: Enterprise Applications and Research

This article presents an overview of speech translation, discusses its motivations and applications at ByteDance, compares cascade and end‑to‑end modeling approaches, introduces advanced encoder and decoder designs such as LUT, Chimera, and COSTT, outlines progressive multi‑task training and data‑augmentation strategies, and shares experimental results and Q&A.

AIAudio Processingend-to-end models
0 likes · 16 min read
Speech Translation: Enterprise Applications and Research
Beike Product & Technology
Beike Product & Technology
Jul 1, 2021 · Artificial Intelligence

Semantic Data Augmentation and GigaSpeech: Highlights of Two INTERSPEECH 2021 Papers from the Beike Voice Team

The article summarizes two INTERSPEECH 2021 papers from Beike's voice technology team, detailing a grammar‑based semantic data augmentation method that improves end‑to‑end Chinese speech recognition and introducing GigaSpeech, a massive 10,000‑hour multilingual English speech dataset for robust ASR research.

ChineseGigaSpeechInterspeech
0 likes · 7 min read
Semantic Data Augmentation and GigaSpeech: Highlights of Two INTERSPEECH 2021 Papers from the Beike Voice Team