Tagged articles

speech processing

10 articles · Page 1 of 1

May 5, 2026 · Artificial Intelligence

How Audio Waveforms Are Turned Into Model‑Readable Tokens

The article explains why raw audio cannot be fed directly to language models, outlines the two essential compression steps, compares three common tokenization approaches—neural codecs, self‑supervised clustering, and continuous vectors—and warns of typical pitfalls for newcomers.

Large Language Modelsaudio tokenizationneural codecs

0 likes · 6 min read

How Audio Waveforms Are Turned Into Model‑Readable Tokens

AI Frontier Lectures

Mar 30, 2025 · Artificial Intelligence

Do Large Language Models Mirror Human Brain Language Processing? Google’s Groundbreaking Findings

Google researchers discovered a linear relationship between brain activity recorded during natural conversation and the internal embeddings of a speech‑to‑text large language model, revealing that acoustic and lexical representations from the model can accurately predict neural responses in both language comprehension and production.

AI researchGoogleLarge Language Models

0 likes · 8 min read

Do Large Language Models Mirror Human Brain Language Processing? Google’s Groundbreaking Findings

Tencent Tech

Jan 20, 2023 · Artificial Intelligence

Why Tencent’s Yu Dong Became a 2022 ACM Fellow: Pioneering Deep Learning in Speech

Tencent AI Lab deputy Yu Dong was named a 2022 ACM Fellow for his groundbreaking work in speech processing and deep‑learning applications, boasting over 100 patents, multiple best‑paper awards, and technologies now embedded in many of Tencent’s products.

ACM FellowArtificial IntelligenceDeep Learning

0 likes · 5 min read

Why Tencent’s Yu Dong Became a 2022 ACM Fellow: Pioneering Deep Learning in Speech

DataFunSummit

Oct 20, 2022 · Artificial Intelligence

End-to-End Speech Relation Extraction

This paper presents an end‑to‑end approach for extracting relational triples directly from speech signals, bypassing intermediate transcription, and demonstrates its effectiveness on synthesized speech versions of the CoNLL04 and TACRED datasets, highlighting challenges such as length constraints and cross‑modal alignment.

End-to-EndMultimodalnatural language processing

0 likes · 17 min read

NetEase Smart Enterprise Tech+

Feb 23, 2021 · Artificial Intelligence

How Deep Learning Detects Pornographic and ASMR Audio

This article explains a deep‑learning pipeline that preprocesses audio, extracts FBank features, applies SpecAugment, and uses a CNN‑BI‑LSTM‑Attention model to automatically identify pornographic and ASMR speech for content moderation.

ASMR detectionAudio ClassificationSpecAugment

0 likes · 8 min read

How Deep Learning Detects Pornographic and ASMR Audio

Huawei Cloud Developer Alliance

Nov 19, 2020 · Artificial Intelligence

How to Identify TikTok Background Songs Using Huawei Cloud AI and Python

This article walks through a hands‑on experiment using Huawei Cloud RDS and Python to convert short videos into audio, extract voiceprints, and match them against a database, demonstrating how speech‑signal processing enables accurate identification of background songs in TikTok‑style clips.

AICloud ComputingPython

0 likes · 5 min read

How to Identify TikTok Background Songs Using Huawei Cloud AI and Python

58 Tech

Nov 16, 2020 · Artificial Intelligence

Iterative Optimization of Voice Endpoint Detection for Voice Robots: From Dual‑Threshold to WebRTC VAD and VADNet

This article details the evolution of the voice endpoint detection (VAD) module in 58.com’s voice robot, comparing a dual‑threshold method, Google’s WebRTC VAD, and the deep‑learning based VADNet, and presents experimental results on accuracy, recall, F1 score and online latency.

Real‑time communicationVADVoice Activity Detection

0 likes · 14 min read

Iterative Optimization of Voice Endpoint Detection for Voice Robots: From Dual‑Threshold to WebRTC VAD and VADNet

JD Cloud Developers

Oct 27, 2020 · Artificial Intelligence

How JD AI’s Four Interspeech 2020 Papers Advance Speech Processing

JD AI Research Institute presented four accepted Interspeech 2020 papers—covering sound event localization, speech dereverberation, speaker verification, and an efficient WaveGlow vocoder—demonstrating significant advances in audio AI despite the conference’s shift to an online format due to COVID‑19.

Audio AIneural vocodersound event detection

0 likes · 8 min read

How JD AI’s Four Interspeech 2020 Papers Advance Speech Processing

58 Tech

Jul 31, 2020 · Artificial Intelligence

Intelligent Voice Quality Inspection System: Architecture, Core Technologies, and Business Cases

This article presents 58.com’s intelligent voice quality inspection system, detailing its overall architecture, speech separation, speaker role identification, NLP‑based tagging, model choices such as VGG, BERT, ALBERT and SPTM, and real‑world call‑center use cases that improve efficiency and reduce risk.

AINLPcall center

0 likes · 20 min read

Intelligent Voice Quality Inspection System: Architecture, Core Technologies, and Business Cases

58 Tech

May 28, 2019 · Artificial Intelligence

Implementation of Voice Call Functionality in an Intelligent Voice Robot

This article details the architecture and implementation of the voice call module of an intelligent voice robot, covering SIP signaling establishment, RTP session handling, audio encoding/decoding, sampling, and packetization to enable automated outbound calls and multi‑round voice interactions.

AISIPTelephony

0 likes · 9 min read

Implementation of Voice Call Functionality in an Intelligent Voice Robot