Tagged articles

Conformer

4 articles · Page 1 of 1

Apr 16, 2026 · Artificial Intelligence

Deep Dive into Conformer: The Convolution‑Augmented Transformer for Speech Recognition

The Conformer architecture blends global self‑attention with a depthwise separable convolution module in a Macaron‑style block, addressing the strong local time‑frequency structure and long sequence length of speech signals while keeping computational cost manageable for modern ASR systems.

ASRConformerConvolution

0 likes · 11 min read

Deep Dive into Conformer: The Convolution‑Augmented Transformer for Speech Recognition

DataFunTalk

Sep 21, 2023 · Artificial Intelligence

2023 Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) Overview

The 2023 Chinese Continuous Visual Speech Recognition Challenge (CNVSRC), organized by Tsinghua University and partners, introduces the large-scale CN-CVS dataset, defines single- and multi-speaker lip‑reading tasks, provides baseline Conformer models, outlines registration, data access, evaluation metrics, and competition schedule.

.aiConformerchallenge

0 likes · 7 min read

2023 Chinese Continuous Visual Speech Recognition Challenge (CNVSRC) Overview

DataFunSummit

May 4, 2023 · Artificial Intelligence

An Overview of NVIDIA NeMo for Speech AI: ASR Training, Chinese Support, and Related Applications

This article provides a comprehensive introduction to NVIDIA's NeMo toolkit for conversational AI, detailing its ASR capabilities, model architectures, training workflow, Chinese language support, deployment options, and additional speech AI features such as VAD and speaker diarization.

ASRChinese SpeechConformer

0 likes · 15 min read

An Overview of NVIDIA NeMo for Speech AI: ASR Training, Chinese Support, and Related Applications

Zuoyebang Tech Team

May 19, 2022 · Artificial Intelligence

How to Achieve High‑Quality TTS with Only Minutes of Data

This article reviews neural speech synthesis, explains why large high‑quality paired data are essential, and presents a range of low‑resource solutions—including semi‑supervised pre‑training, cross‑language transfer, speaker embedding, and Conformer‑based model upgrades—demonstrating how the Zuoyebang team built a robust TTS system with as little as 7‑minute speaker recordings.

ConformerFastspeech2Speech synthesis

0 likes · 15 min read

How to Achieve High‑Quality TTS with Only Minutes of Data