Tag

ASR

0 views collected around this technical thread.

Efficient Ops
Efficient Ops
Oct 28, 2024 · Artificial Intelligence

How AI Powers Real-Time Business Hot‑Word Monitoring in Remote Banking

ICBC's remote‑banking hotline system uses AI, speech recognition and Python keyword extraction to rank inbound business volumes and surface hot‑word trends, delivering early alerts that help prevent risks, resolve customer issues, and support data‑driven decision making across millions of daily transactions.

AIASRBusiness Monitoring
0 likes · 4 min read
How AI Powers Real-Time Business Hot‑Word Monitoring in Remote Banking
DataFunTalk
DataFunTalk
Jun 3, 2024 · Artificial Intelligence

Deploying Speech AI Services Quickly with NVIDIA Riva

This article explains how to use NVIDIA Riva to rapidly deploy speech AI services, covering Riva's overview, Chinese ASR model updates, TTS capabilities, customization options, the Quickstart tool, and a Q&A session that clarifies deployment, model fine‑tuning, and integration with NeMo and Triton.

ASRGPU AccelerationNVIDIA Riva
0 likes · 13 min read
Deploying Speech AI Services Quickly with NVIDIA Riva
DataFunTalk
DataFunTalk
Feb 13, 2024 · Artificial Intelligence

An Overview of NVIDIA NeMo: Open‑Source Framework for Speech AI, ASR, TTS, NLP and Large Language Model Training

This article introduces NVIDIA’s open‑source NeMo framework, detailing its PyTorch‑based architecture for Speech AI, ASR and TTS training, NLP and LLM support, GPU‑optimized parallelism, pre‑trained model resources, fine‑tuning techniques, and the accompanying NeMo Aligner and Framework tools.

ASRNVIDIA NeMoPyTorch
0 likes · 18 min read
An Overview of NVIDIA NeMo: Open‑Source Framework for Speech AI, ASR, TTS, NLP and Large Language Model Training
DataFunTalk
DataFunTalk
Jan 26, 2024 · Artificial Intelligence

Efficient Deployment of Speech AI Models on GPUs

This article explains how to efficiently deploy speech AI models—including ASR and TTS—on GPUs using NVIDIA's Triton Inference Server and TensorRT, covering background challenges, GPU‑based solutions, decoding optimizations, Whisper acceleration with TensorRT‑LLM, streaming TTS improvements, voice‑cloning pipelines, future plans, and a Q&A session.

ASRGPUInference
0 likes · 20 min read
Efficient Deployment of Speech AI Models on GPUs
Ctrip Technology
Ctrip Technology
Dec 21, 2023 · Backend Development

Load Balancing ASR Services in Ctrip Call Center: Architecture and Implementation with FreeSWITCH and OpenSIPS

This article details the design, evolution, and best‑practice implementation of load‑balancing for ASR (speech‑recognition) services in Ctrip's massive call‑center, covering component architecture, MRCP integration, challenges with traditional balancers, and two practical solutions using FreeSWITCH distributor and OpenSIPS.

ASRFreeSWITCHLoad Balancing
0 likes · 27 min read
Load Balancing ASR Services in Ctrip Call Center: Architecture and Implementation with FreeSWITCH and OpenSIPS
Ximalaya Technology Team
Ximalaya Technology Team
Dec 19, 2023 · Cloud Computing

Text-Based Audio Editing in Cloud Editing: Architecture, Features, and Performance Optimizations

The article discusses cloud-based audio editing tool architecture, focusing on text‑based editing enabled by ASR, hierarchical DOM (Word, Sentence, Paragraph), performance challenges with massive character nodes, and optimizations like viewport‑based rendering and efficient drag‑select, achieving large speed gains for long recordings.

ASRaudio editingcloud editing
0 likes · 14 min read
Text-Based Audio Editing in Cloud Editing: Architecture, Features, and Performance Optimizations
Bilibili Tech
Bilibili Tech
Oct 13, 2023 · Artificial Intelligence

Multimodal Video High‑Energy Segment Extraction for Dynamic Video Covers

The authors present a multimodal system that automatically extracts high‑energy video segments for dynamic covers by analyzing subtitles, audio, visual frames, and danmu, employing LLM prompt‑tuning, scene‑cut detection, and aesthetic scoring to reduce manual effort and boost click‑through rates.

ASROCRVideo Summarization
0 likes · 14 min read
Multimodal Video High‑Energy Segment Extraction for Dynamic Video Covers
DataFunTalk
DataFunTalk
Sep 23, 2023 · Artificial Intelligence

Paraformer: An Industrial Non‑Autoregressive End‑to‑End Speech Recognition Model and Its Deployment on ModelScope

This article introduces the Paraformer non‑autoregressive end‑to‑end speech recognition model released by Alibaba DAMO Academy, details its architecture, training strategies, large‑scale performance, and provides step‑by‑step guidance for using and fine‑tuning the model on the ModelScope platform with the FunASR toolkit.

ASRModelScopeParaformer
0 likes · 13 min read
Paraformer: An Industrial Non‑Autoregressive End‑to‑End Speech Recognition Model and Its Deployment on ModelScope
DataFunTalk
DataFunTalk
Sep 19, 2023 · Artificial Intelligence

Simultaneous Speech Translation: Technical Background, System Architecture, and Key Challenges

This article reviews the technical background of simultaneous speech translation, compares offline and real‑time scenarios, details ASR and MT technologies, describes the system architecture and design strategies, and discusses the major challenges and solutions for deploying robust, low‑latency translation services.

ASRHuaweideep learning
0 likes · 16 min read
Simultaneous Speech Translation: Technical Background, System Architecture, and Key Challenges
58 Tech
58 Tech
Jun 21, 2023 · Artificial Intelligence

GPU Hotword Enhancement for WeNet End-to-End Speech Recognition

This article explains the design, implementation, and experimental evaluation of hot‑word augmentation in WeNet's GPU runtime, detailing how character‑ and word‑based language model scoring are extended to boost recognition of rare proper nouns in both streaming and non‑streaming ASR services.

ASRCTC decoderGPU
0 likes · 12 min read
GPU Hotword Enhancement for WeNet End-to-End Speech Recognition
DataFunSummit
DataFunSummit
Jun 15, 2023 · Artificial Intelligence

Paraformer: An Industrial Non‑Autoregressive End‑to‑End Speech Recognition Model

This article introduces the Paraformer model released by Alibaba DAMO Academy on ModelScope, detailing its non‑autoregressive architecture, training strategies, performance on benchmark datasets, and step‑by‑step guidance for fine‑tuning and deploying the model using FunASR and ModelScope pipelines.

ASRModelScopeParaformer
0 likes · 13 min read
Paraformer: An Industrial Non‑Autoregressive End‑to‑End Speech Recognition Model
DataFunSummit
DataFunSummit
Apr 18, 2023 · Artificial Intelligence

Best Practices for Deploying Speech AI on GPUs with Triton and TensorRT

This article presents comprehensive best‑practice guidelines for deploying conversational speech AI—including ASR and TTS pipelines—on GPU servers using NVIDIA Triton Inference Server and TensorRT, covering workflow overview, performance optimizations, streaming inference, and real‑world deployment tips.

ASRGPU DeploymentSpeech AI
0 likes · 14 min read
Best Practices for Deploying Speech AI on GPUs with Triton and TensorRT
Bilibili Tech
Bilibili Tech
Feb 28, 2023 · Artificial Intelligence

High‑Quality Automatic Speech Recognition (ASR) Solutions at Bilibili: Data, Model, and Deployment Optimizations

Bilibili’s high‑quality ASR system combines large‑scale filtered business data, semi‑supervised Noisy‑Student training, an end‑to‑end CTC model with lattice‑free MMI decoding, and FP16‑optimized FasterTransformer inference on Triton, delivering top‑ranked accuracy, low latency, and scalable deployment for diverse Chinese‑English video content.

ASRBilibiliSpeech Recognition
0 likes · 18 min read
High‑Quality Automatic Speech Recognition (ASR) Solutions at Bilibili: Data, Model, and Deployment Optimizations
58 Tech
58 Tech
Jan 12, 2023 · Artificial Intelligence

Efficient Conformer for End‑to‑End Speech Recognition: Model, Implementation, Streaming Inference, and Experimental Results

This article presents a comprehensive overview of the Efficient Conformer model for large‑scale end‑to‑end speech recognition, detailing its architectural improvements such as progressive downsampling and grouped multi‑head self‑attention, the PyTorch implementation in WeNet, streaming inference handling, experimental CER gains on AISHELL‑1 and production data, and future development plans.

ASREfficient ConformerPyTorch
0 likes · 16 min read
Efficient Conformer for End‑to‑End Speech Recognition: Model, Implementation, Streaming Inference, and Experimental Results
DataFunTalk
DataFunTalk
Jul 30, 2022 · Artificial Intelligence

Technical Analysis of Huawei’s Offline Speech‑to‑Text and Length‑Constrained Speech Translation Systems in IWSLT 2022

This article reviews the IWSLT 2022 competition tasks, explains Huawei’s cascade offline speech‑to‑text translation pipeline, details four major technical innovations—including ensemble‑based ASR de‑noise, context‑aware re‑ranking, domain‑controlled training, and length‑control strategies—and presents experimental results that demonstrate Huawei’s leading performance across multiple language directions.

ASRHuaweiIWSLT
0 likes · 18 min read
Technical Analysis of Huawei’s Offline Speech‑to‑Text and Length‑Constrained Speech Translation Systems in IWSLT 2022
DataFunTalk
DataFunTalk
Jul 7, 2022 · Artificial Intelligence

Huawei Translation’s Achievements and Technical Solutions in IWSLT 2022 Speech Translation Tasks

This article reviews Huawei Translation’s top-ranking results in the IWSLT 2022 speech translation competition across speech‑to‑speech, offline speech‑to‑text, and length‑controlled translation tasks, and details their cascade and end‑to‑end technical approaches, including domain‑controlled ASR, context‑aware MT re‑ranking, and VITS‑based TTS.

ASRHuaweiIWSLT
0 likes · 13 min read
Huawei Translation’s Achievements and Technical Solutions in IWSLT 2022 Speech Translation Tasks
DataFunSummit
DataFunSummit
Dec 3, 2021 · Artificial Intelligence

Real‑Time Voice Dialogue: Practices, Challenges, and Duplex Conversation

This article presents an in‑depth overview of Alibaba's real‑time voice dialogue system, covering the Hotline XiaoMi robot, the unique challenges of spoken interactions such as colloquialism, multimodality and duplex communication, and the research advances in ASR‑robust SLU, emotion detection, colloquial processing, and duplex conversation modeling.

ASRSLUSpeech AI
0 likes · 22 min read
Real‑Time Voice Dialogue: Practices, Challenges, and Duplex Conversation
Sohu Tech Products
Sohu Tech Products
May 12, 2021 · Artificial Intelligence

Zero‑Basis Food Sound Recognition with ASR: Theory, Workflow, and Complete Python Code

This article introduces the fundamentals of automatic speech recognition (ASR) for food‑sound classification, explains key audio representations and modeling approaches, and provides a fully runnable Python implementation using librosa, TensorFlow/Keras, and classic machine‑learning tools to train and predict on the Tianchi competition dataset.

ASRCNNPython
0 likes · 11 min read
Zero‑Basis Food Sound Recognition with ASR: Theory, Workflow, and Complete Python Code