Tag

Speech Recognition

0 views collected around this technical thread.

System Architect Go
System Architect Go
Nov 28, 2024 · Artificial Intelligence

An Overview of Modern AI Audio Technologies: ASR, TTS, and Voice Cloning

This article explains how modern AI advances have transformed audio processing, covering digital audio fundamentals, automatic speech recognition (ASR), text‑to‑speech (TTS), voice cloning techniques, and provides practical Python code examples using OpenAI Whisper and HuggingFace TTS models.

AIAudio ProcessingSpeech Recognition
0 likes · 7 min read
An Overview of Modern AI Audio Technologies: ASR, TTS, and Voice Cloning
DataFunTalk
DataFunTalk
Sep 23, 2023 · Artificial Intelligence

Paraformer: An Industrial Non‑Autoregressive End‑to‑End Speech Recognition Model and Its Deployment on ModelScope

This article introduces the Paraformer non‑autoregressive end‑to‑end speech recognition model released by Alibaba DAMO Academy, details its architecture, training strategies, large‑scale performance, and provides step‑by‑step guidance for using and fine‑tuning the model on the ModelScope platform with the FunASR toolkit.

ASRModelScopeParaformer
0 likes · 13 min read
Paraformer: An Industrial Non‑Autoregressive End‑to‑End Speech Recognition Model and Its Deployment on ModelScope
Test Development Learning Exchange
Test Development Learning Exchange
Jul 27, 2023 · Artificial Intelligence

Splitting PDF Files and Recognizing MP3 Audio with Python

This guide explains how to split a PDF into separate files using PyPDF2 and provides two Python approaches for converting MP3 audio to text—one leveraging Google Speech‑Recognition for higher accuracy and another using PocketSphinx for complete transcription—complete with ready‑to‑run code examples.

AutomationPDFPyPDF2
0 likes · 5 min read
Splitting PDF Files and Recognizing MP3 Audio with Python
58 Tech
58 Tech
Jul 6, 2023 · Artificial Intelligence

Design and Optimization of a Kaldi‑Based Speech Recognition Backend at 58.com

This article details the evolution from the initial Kaldi‑based speech recognition architecture (version 1.0) to a re‑engineered version 2.0, describing business background, service components, identified shortcomings, and a series of performance, concurrency, GPU, I/O, GC, and dispatch optimizations that dramatically improve resource utilization, latency, and reliability for large‑scale voice processing at 58.com.

AIGPUKaldi
0 likes · 15 min read
Design and Optimization of a Kaldi‑Based Speech Recognition Backend at 58.com
58 Tech
58 Tech
Jun 21, 2023 · Artificial Intelligence

GPU Hotword Enhancement for WeNet End-to-End Speech Recognition

This article explains the design, implementation, and experimental evaluation of hot‑word augmentation in WeNet's GPU runtime, detailing how character‑ and word‑based language model scoring are extended to boost recognition of rare proper nouns in both streaming and non‑streaming ASR services.

ASRCTC decoderGPU
0 likes · 12 min read
GPU Hotword Enhancement for WeNet End-to-End Speech Recognition
php中文网 Courses
php中文网 Courses
Jun 17, 2023 · Mobile Development

Implementing Voice Functionality in WeChat Mini Programs

This guide explains how to integrate WeChat Mini Program voice capabilities by importing the recorder and audio APIs, recording audio, uploading for speech recognition, and playing back the result, with example code snippets for each step.

JavaScriptSpeech RecognitionVoice API
0 likes · 3 min read
Implementing Voice Functionality in WeChat Mini Programs
DataFunSummit
DataFunSummit
Jun 15, 2023 · Artificial Intelligence

Paraformer: An Industrial Non‑Autoregressive End‑to‑End Speech Recognition Model

This article introduces the Paraformer model released by Alibaba DAMO Academy on ModelScope, detailing its non‑autoregressive architecture, training strategies, performance on benchmark datasets, and step‑by‑step guidance for fine‑tuning and deploying the model using FunASR and ModelScope pipelines.

ASRModelScopeParaformer
0 likes · 13 min read
Paraformer: An Industrial Non‑Autoregressive End‑to‑End Speech Recognition Model
Bilibili Tech
Bilibili Tech
Feb 28, 2023 · Artificial Intelligence

High‑Quality Automatic Speech Recognition (ASR) Solutions at Bilibili: Data, Model, and Deployment Optimizations

Bilibili’s high‑quality ASR system combines large‑scale filtered business data, semi‑supervised Noisy‑Student training, an end‑to‑end CTC model with lattice‑free MMI decoding, and FP16‑optimized FasterTransformer inference on Triton, delivering top‑ranked accuracy, low latency, and scalable deployment for diverse Chinese‑English video content.

ASRBilibiliSpeech Recognition
0 likes · 18 min read
High‑Quality Automatic Speech Recognition (ASR) Solutions at Bilibili: Data, Model, and Deployment Optimizations
DataFunTalk
DataFunTalk
Feb 5, 2023 · Artificial Intelligence

A Six‑Year Retrospective on Deep Learning Algorithms and Their Applications

This article reviews the author’s six‑year hands‑on experience with deep learning, covering breakthroughs in speech recognition, computer vision, language modeling, reinforcement learning, privacy protection, model compression, recommendation systems, and future research directions, while summarizing technical lessons and practical insights.

AIComputer VisionRecommendation systems
0 likes · 30 min read
A Six‑Year Retrospective on Deep Learning Algorithms and Their Applications
DataFunSummit
DataFunSummit
Jan 14, 2023 · Artificial Intelligence

Key Transformer Model Papers Across Language, Vision, Speech, and Time‑Series Domains

This article surveys the most influential Transformer‑based research papers—from the original Attention Is All You Need work to recent models such as Autoformer and FEDformer—covering breakthroughs in natural language processing, computer vision, speech recognition, and long‑term series forecasting, and provides download links for each.

AISpeech RecognitionTransformer
0 likes · 17 min read
Key Transformer Model Papers Across Language, Vision, Speech, and Time‑Series Domains
58 Tech
58 Tech
Jan 12, 2023 · Artificial Intelligence

Efficient Conformer for End‑to‑End Speech Recognition: Model, Implementation, Streaming Inference, and Experimental Results

This article presents a comprehensive overview of the Efficient Conformer model for large‑scale end‑to‑end speech recognition, detailing its architectural improvements such as progressive downsampling and grouped multi‑head self‑attention, the PyTorch implementation in WeNet, streaming inference handling, experimental CER gains on AISHELL‑1 and production data, and future development plans.

ASREfficient ConformerPyTorch
0 likes · 16 min read
Efficient Conformer for End‑to‑End Speech Recognition: Model, Implementation, Streaming Inference, and Experimental Results
DataFunTalk
DataFunTalk
Dec 7, 2022 · Artificial Intelligence

Vivo's Self‑Developed Streaming Speech‑Recognition Inference Engine and KunlunChip High‑Performance Inference Library

The article details vivo's development of a high‑accuracy, high‑performance streaming speech‑recognition inference engine built on the wenet framework, its optimization techniques such as dynamic batching and memory pooling, collaborative acceleration with KunlunChip's high‑performance inference library, and extensive performance benchmarks demonstrating multi‑batch GPU and XPU gains.

AI inferenceKunlun chipSpeech Recognition
0 likes · 10 min read
Vivo's Self‑Developed Streaming Speech‑Recognition Inference Engine and KunlunChip High‑Performance Inference Library
58 Tech
58 Tech
Sep 29, 2022 · Artificial Intelligence

End-to-End Speech Recognition Optimization and Deployment at 58.com

58.com’s AI Lab presents a comprehensive overview of its end‑to‑end speech recognition system, detailing data collection, semi‑supervised training, Efficient Conformer architecture, model compression, and deployment strategies that together achieve high accuracy across diverse acoustic conditions and large‑scale production workloads.

AIDeploymentEfficient Conformer
0 likes · 19 min read
End-to-End Speech Recognition Optimization and Deployment at 58.com
DataFunSummit
DataFunSummit
Sep 5, 2022 · Artificial Intelligence

Comprehensive Evaluation of Long‑Audio Speech‑to‑Text Services from Major Cloud Providers

This article presents a systematic, multi‑dimensional benchmark of six leading cloud speech‑recognition platforms—Alibaba Cloud, Tencent Cloud, iFlytek, Baidu Cloud, Huawei Cloud, and Microsoft Azure—using a 22.6‑hour, 81‑file Mandarin dataset, scoring with the CORR metric and SCTK tool, and discusses each provider's workflow, strengths, pitfalls, and cost.

AICloud ServicesSCTK
0 likes · 15 min read
Comprehensive Evaluation of Long‑Audio Speech‑to‑Text Services from Major Cloud Providers
Python Programming Learning Circle
Python Programming Learning Circle
Apr 22, 2022 · Artificial Intelligence

Building a Python Voice Chatbot with Baidu AI Speech Recognition and Qingyunke

This tutorial explains how to create a Python voice chatbot by recording audio, converting speech to text with Baidu AI, sending the text to the Qingyunke chatbot API for a response, and finally synthesizing the reply back into speech using pyttsx3.

ChatbotSpeech Recognitionbaidu-ai
0 likes · 8 min read
Building a Python Voice Chatbot with Baidu AI Speech Recognition and Qingyunke
NetEase LeiHuo Testing Center
NetEase LeiHuo Testing Center
Apr 15, 2022 · Artificial Intelligence

Practical AI‑Powered Voice Recognition for Game Dialogue Testing: A Step‑by‑Step Case Study

This article presents a detailed case study of using AI speech‑recognition techniques—including acoustic modeling with VGG, pypinyin conversion, feature extraction, and CTC decoding—to automatically verify game dialogue audio against script text, outlining the workflow, challenges, implementation details, and experimental results.

AICTC decodingPython
0 likes · 10 min read
Practical AI‑Powered Voice Recognition for Game Dialogue Testing: A Step‑by‑Step Case Study
DataFunSummit
DataFunSummit
Apr 1, 2022 · Artificial Intelligence

Detecting Invalid Queries in Voice Interaction: Non‑Human Interaction and Ambiguous Intent Recognition

This talk presents a comprehensive study of invalid query detection in voice assistants, covering the definition of effective and ineffective queries, challenges of non‑human interaction and ambiguous intent recognition, data collection, model design, experimental results, user‑feedback loops, and future research directions.

Natural Language UnderstandingSpeech Recognitioninvalid query detection
0 likes · 20 min read
Detecting Invalid Queries in Voice Interaction: Non‑Human Interaction and Ambiguous Intent Recognition
DataFunTalk
DataFunTalk
Mar 20, 2022 · Artificial Intelligence

Detecting Invalid Queries in Voice Interaction: Non‑Human Interaction and Ambiguous Intent Recognition

This talk presents a comprehensive study of invalid query detection in voice assistants, covering the definition and taxonomy of invalid queries, challenges of non‑human interaction and ambiguous intent recognition, data collection and labeling strategies, feature engineering, deep neural network modeling, experimental results, user‑feedback loops, and current performance limits.

AISpeech Recognitiondialogue system
0 likes · 17 min read
Detecting Invalid Queries in Voice Interaction: Non‑Human Interaction and Ambiguous Intent Recognition
Baidu Geek Talk
Baidu Geek Talk
Feb 14, 2022 · Artificial Intelligence

AI Sign Language Digital Human: Technology, Challenges, and Development by Baidu Intelligent Cloud

Baidu’s AI‑driven sign‑language digital human combines ultra‑accurate speech recognition, specialized translation, and precise gesture‑generation models—backed by extensive motion‑capture data and expert validation—to deliver 24‑hour, high‑fidelity signing for millions of hearing‑impaired users, showcasing inclusive AI communication.

AIAccessibilityDigital Human
0 likes · 12 min read
AI Sign Language Digital Human: Technology, Challenges, and Development by Baidu Intelligent Cloud