Tagged articles

Audio Processing

53 articles · Page 1 of 1

May 1, 2026 · Artificial Intelligence

ACE-Step UI: An Open-Source, Free Alternative to Suno for AI Music Generation

ACE-Step UI is a completely free, locally run open-source interface for the ACE‑Step 1.5 AI music model, offering professional‑grade song generation, a modern React/Express stack, and a suite of audio tools, making it a viable alternative to Suno.

AI music generationAudio ProcessingExpress

0 likes · 7 min read

ACE-Step UI: An Open-Source, Free Alternative to Suno for AI Music Generation

Data STUDIO

Mar 26, 2026 · Operations

10 Open‑Source Python Tools That Replace Paid SaaS Apps

The article presents ten Python libraries—pikepdf, Playwright, pdf2image + pytesseract, moviepy, pydub + ffmpeg, reportlab, yt‑dlp, watchdog, pyvirtualcam, and rich + textual—each with code samples, runtime requirements, complexity analysis, practical tips, and common pitfalls, showing how they can substitute costly commercial software while offering greater control, privacy, and customization.

Audio ProcessingAutomationFile Monitoring

0 likes · 19 min read

10 Open‑Source Python Tools That Replace Paid SaaS Apps

Data Party THU

Oct 8, 2025 · Artificial Intelligence

Build a Music Genre Classifier from Scratch with KNN and MFCC

This tutorial walks through constructing a complete music‑genre classification project using Python, covering dataset preparation, MFCC feature extraction, K‑Nearest Neighbors implementation, train‑test splitting, model evaluation, and testing on new audio files, all with reproducible code snippets.

Audio ProcessingMFCCMusic Genre Classification

0 likes · 14 min read

Build a Music Genre Classifier from Scratch with KNN and MFCC

Data STUDIO

Sep 15, 2025 · Artificial Intelligence

Build a Music Genre Classifier with KNN and MFCC from Scratch

This tutorial walks through building a music‑genre classification system using the GTZAN dataset, extracting MFCC features, implementing a K‑Nearest Neighbors classifier in Python, and achieving roughly 70% accuracy on test data.

Audio ProcessingMFCCMusic Genre Classification

0 likes · 14 min read

Build a Music Genre Classifier with KNN and MFCC from Scratch

Baidu Geek Talk

Aug 4, 2025 · Fundamentals

How to Build High‑Performance Audio Post‑Processing with FFmpeg: Bass Boost & Voice Clarity

This article explains the importance of audio post‑processing in modern player architectures, outlines a modular FFmpeg‑based framework, details core techniques such as bass enhancement and voice clarity, provides algorithmic insights and code snippets, and shows how to integrate these filters into a playback pipeline.

Audio ProcessingFFmpegMedia Playback

0 likes · 17 min read

How to Build High‑Performance Audio Post‑Processing with FFmpeg: Bass Boost & Voice Clarity

Baidu App Technology

Jul 29, 2025 · Fundamentals

How to Build High‑Performance Bass Boost and Voice Clarity Filters with FFmpeg

This article explains the architecture, key techniques, and implementation details of audio post‑processing in a media player, covering bass‑enhancement and voice‑clarity filters, frequency‑range design, device constraints, FFmpeg filter chains, and sample code for a high‑performance, low‑latency solution.

Audio ProcessingFFmpegbass boost

0 likes · 17 min read

How to Build High‑Performance Bass Boost and Voice Clarity Filters with FFmpeg

Cognitive Technology Team

Jul 1, 2025 · Artificial Intelligence

How We Built a Live‑Streaming TTS Engine: From Data Pipelines to AI Voice Generation

This article presents a comprehensive practice summary of building an intelligent digital‑human system, covering six core modules—LLM content generation, LLM interaction, TTS synthesis, visual driving, audio‑video engineering, and backend services—while detailing data collection, signal processing, ASR annotation, speaker clustering, model optimization (V1‑V4), evaluation metrics, and future research directions.

AI voiceAudio ProcessingLLM

0 likes · 23 min read

How We Built a Live‑Streaming TTS Engine: From Data Pipelines to AI Voice Generation

Programmer DD

May 13, 2025 · Frontend Development

How I Built a Cross‑Platform Audio/Video App in Hours with AI‑Powered CodeBuddy

This article chronicles how a developer transformed the TransDuck audio‑video SaaS tool into a native desktop application using Tauri, Vue, and ffmpeg, while leveraging the AI‑driven CodeBuddy extension to automate project scaffolding, code generation, error fixing, and UI refinement, cutting development time from days to a few hours.

AI‑assisted developmentAudio ProcessingTauri

0 likes · 10 min read

How I Built a Cross‑Platform Audio/Video App in Hours with AI‑Powered CodeBuddy

System Architect Go

Nov 28, 2024 · Artificial Intelligence

An Overview of Modern AI Audio Technologies: ASR, TTS, and Voice Cloning

This article explains how modern AI advances have transformed audio processing, covering digital audio fundamentals, automatic speech recognition (ASR), text‑to‑speech (TTS), voice cloning techniques, and provides practical Python code examples using OpenAI Whisper and HuggingFace TTS models.

AIAudio ProcessingText‑to‑Speech

0 likes · 7 min read

An Overview of Modern AI Audio Technologies: ASR, TTS, and Voice Cloning

Rare Earth Juejin Tech Community

Oct 18, 2024 · Fundamentals

Fundamentals of Audio and Video Processing, Compression, and Streaming Protocols

This article provides a comprehensive overview of audio and video fundamentals, including signal conversion, PCM encoding, compression techniques, spatial audio concepts, video encoding standards such as H.264/H.265, streaming protocols, bitrate control, and practical optimization algorithms for both audio and video pipelines.

Audio ProcessingStreaming ProtocolsVideo Encoding

0 likes · 49 min read

Fundamentals of Audio and Video Processing, Compression, and Streaming Protocols

Kuaishou Tech

May 31, 2024 · Artificial Intelligence

Innovative Features and Technical Implementation of Huaisen K‑Song Community: Recording, Editing, and Smart Pitch Correction

This article details how Huaisen reshapes the karaoke workflow by introducing innovative features such as clear‑singing pitch‑finding, a comprehensive editing SDK, and intelligent pitch‑correction algorithms, explaining the underlying audio analysis, strategy generation, and system architecture that enhance user experience across recording, editing, and publishing stages.

AIAudio ProcessingMobile App

0 likes · 21 min read

Innovative Features and Technical Implementation of Huaisen K‑Song Community: Recording, Editing, and Smart Pitch Correction

Kuaishou Tech

May 22, 2024 · Mobile Development

Technical Deep Dive into the Music Bullet (弹幕) System in a K‑Song Community

This article provides a comprehensive technical analysis of the music bullet feature in a K‑song community, detailing its core roles, client‑side production and consumption pipelines, real‑time mixing, alignment, volume balancing, precise seeking, performance optimizations, scalability, and sharing mechanisms across iOS and Android platforms.

AndroidAudio ProcessingiOS

0 likes · 18 min read

Technical Deep Dive into the Music Bullet (弹幕) System in a K‑Song Community

21CTO

May 14, 2024 · Artificial Intelligence

What Makes OpenAI’s New GPT‑4o a Game‑Changing Multimodal AI?

OpenAI’s latest flagship model GPT‑4o combines text, audio, image and video processing in a single, faster, cheaper multimodal system that delivers near‑human response times, expanded API access, and new safety measures, reshaping how developers and users interact with AI.

AI modelAudio ProcessingGPT-4o

0 likes · 10 min read

What Makes OpenAI’s New GPT‑4o a Game‑Changing Multimodal AI?

Test Development Learning Exchange

Mar 28, 2024 · Artificial Intelligence

Introduction to librosa: Audio Loading, Feature Extraction, and Visualization with Python

This article introduces the Python library librosa, outlines its main audio processing features such as loading, visualization, MFCC, pitch detection, chromagram, and rhythm analysis, and provides complete code examples for each operation.

Audio ProcessingMIRPython

0 likes · 7 min read

Introduction to librosa: Audio Loading, Feature Extraction, and Visualization with Python

Test Development Learning Exchange

Mar 28, 2024 · Fundamentals

Introduction to pydub for Audio Processing

pydub is a Python library for audio processing that enables editing, converting, and manipulating audio files through integration with ffmpeg, supporting formats like MP3 and WAV.

Audio ProcessingFFmpegPython Library

0 likes · 4 min read

Introduction to pydub for Audio Processing

DataFunSummit

Feb 4, 2024 · Mobile Development

Advanced Mobile Audio Recording Techniques in Quanjian K‑Song: Low Latency, High Fidelity, and Intelligent Audio Processing

The article details how Quanjian K‑Song has built a comprehensive mobile‑focused audio recording system since 2014, covering low‑latency capture, high‑quality sampling, lyric and vocal‑accompaniment alignment, ear‑return, pitch shifting, vocal enhancement, 3A processing, and AI‑driven scoring to deliver a professional karaoke experience on smartphones.

AI scoringAudio Processingkaraoke technology

0 likes · 14 min read

Advanced Mobile Audio Recording Techniques in Quanjian K‑Song: Low Latency, High Fidelity, and Intelligent Audio Processing

Tencent Music Tech Team

Feb 4, 2024 · Mobile Development

Technical Guidelines for High-Quality Mobile Recording and Audio Processing in Quanmin K Song

Quanmin K Song’s decade‑long mobile‑recording platform combines 48 kHz/16‑bit dry‑signal capture, sub‑70 ms latency via OpenSL ES/AAudio, real‑time clipping and noise detection, lyric‑ and vocal‑accompaniment alignment, pitch‑shifting, adaptive vocal enhancement, 3A DSP/AI processing, and AI‑driven pitch correction to deliver industry‑leading high‑quality mobile singing experiences.

AIAudio ProcessingMusic App

0 likes · 15 min read

Technical Guidelines for High-Quality Mobile Recording and Audio Processing in Quanmin K Song

Kuaishou Tech

Dec 28, 2023 · Artificial Intelligence

Kuaishou Audio Team Wins ICASSP 2024 SSI and PLC Challenges with Advanced Speech Enhancement and Packet Loss Concealment

The Kuaishou audio team secured first place in both the ICASSP 2024 Speech Signal Improvement and Audio Deep Packet Loss Concealment challenges by deploying a two‑stage GAN‑based speech enhancement system and a hybrid time‑frequency packet‑loss concealment model that dramatically improve real‑time communication quality.

Audio ProcessingGaNICASSP 2024

0 likes · 8 min read

Kuaishou Audio Team Wins ICASSP 2024 SSI and PLC Challenges with Advanced Speech Enhancement and Packet Loss Concealment

Ximalaya Technology Team

Nov 17, 2023 · Cloud Computing

Technical Case Study of Cloud Audio Editing: Challenges, Solutions, and Optimization

The case study details how the Cloud Editing team tackled severe waveform loading delays, zoom lag, and inefficient IndexedDB storage by refactoring the processing pipeline, standardizing multi‑transaction storage, adding monitoring and cleanup tools, and rigorously testing releases, ultimately cutting processing times by over half and dramatically improving user experience.

Audio ProcessingIndexedDBTechnical Case Study

0 likes · 9 min read

Technical Case Study of Cloud Audio Editing: Challenges, Solutions, and Optimization

NetEase Smart Enterprise Tech+

Oct 11, 2023 · Backend Development

Decoupling Audio‑Video Algorithms: AVProcessEngine Reduces RTC SDK Size & Improves Performance

The article explains how NetEase Cloud Communication’s AVProcessEngine framework separates audio‑video algorithms from the NERTC SDK, addressing SDK bloat and performance drops on low‑end devices by using plugin‑based processing, dynamic algorithm adjustment, and unified interfaces.

Audio ProcessingPerformance OptimizationReal‑time communication

0 likes · 11 min read

Decoupling Audio‑Video Algorithms: AVProcessEngine Reduces RTC SDK Size & Improves Performance

OPPO Kernel Craftsman

Jul 21, 2023 · Frontend Development

Audio Architecture and Quality Optimization in WebRTC: Devices, 3A Processing, Codec, NetEQ and Scenario‑Based Solutions

The article explains WebRTC’s audio pipeline—from device capture through hardware or software 3A (AEC, ANS, AGC), Opus codec selection, and NetEQ jitter‑buffer handling—detailing how device specifics and scenario‑based configurations (live streaming, meetings, education, watch‑together) affect quality and why pure‑software 3A is emerging as the preferred future solution.

3AAudio ProcessingNetEQ

0 likes · 29 min read

Audio Architecture and Quality Optimization in WebRTC: Devices, 3A Processing, Codec, NetEQ and Scenario‑Based Solutions

Baidu Geek Talk

Feb 15, 2023 · Artificial Intelligence

PaddlePaddle 2.4 Release: New Sparse, Graph, and Audio APIs

PaddlePaddle 2.4 introduces 167 new APIs—including sparse computing (paddle.sparse), graph learning (paddle.geometric), and audio processing (paddle.audio) modules—enabling efficient sparse model training and inference, graph message‑passing, advanced audio feature extraction, plus fresh loss functions, tensor utilities, and expanded vision transforms.

API ReleaseAudio ProcessingPaddlePaddle

0 likes · 16 min read

PaddlePaddle 2.4 Release: New Sparse, Graph, and Audio APIs

DataFunSummit

Aug 8, 2022 · Artificial Intelligence

Voice Analysis for Financial Risk Control: Feature Extraction, Single-Channel Speech Separation, and Text Tagging

This talk presents the application of voice analysis in financial risk control, covering voice‑based risk feature extraction, single‑channel speech separation techniques, and speech‑text labeling methods, demonstrating how acoustic and textual cues can be leveraged to improve risk detection and model performance.

Audio Processingmachine learningrisk control

0 likes · 12 min read

Voice Analysis for Financial Risk Control: Feature Extraction, Single-Channel Speech Separation, and Text Tagging

NetEase Smart Enterprise Tech+

Apr 7, 2022 · Artificial Intelligence

How NetEase Cloud Communication Tackles Voice Reverberation with Adaptive Dual‑Mic Algorithms

This article explains the growing need for speech dereverberation in audio‑video conferencing, outlines the physical causes of reverberation, reviews historical research, and details NetEase Cloud's adaptive dual‑mic signal‑correlation approach, algorithm implementations, performance optimizations, and future directions.

Audio Processingadaptive algorithmsdual-mic

0 likes · 8 min read

How NetEase Cloud Communication Tackles Voice Reverberation with Adaptive Dual‑Mic Algorithms

Python Programming Learning Circle

Apr 4, 2022 · Artificial Intelligence

Building a Simple Speech Synthesis System with iFlytek WebAPI in Python

This tutorial explains how to create a lightweight speech synthesis tool using iFlytek's WebAPI, covering required environment setup, API credential acquisition, GUI design with Tkinter, and detailed Python code for WebSocket communication, audio handling, and WAV file generation.

Audio ProcessingPythonSpeech synthesis

0 likes · 8 min read

Building a Simple Speech Synthesis System with iFlytek WebAPI in Python

Alibaba Terminal Technology

Mar 1, 2022 · Frontend Development

How Alibaba Built a Web‑Based Short Video Editor: Front‑End Insights

This article details Alibaba’s front‑end engineer’s approach to building a web‑based short video editor, covering the motivation, design principles, three‑layer architecture, script protocol, immutable data handling, audio‑video processing with WebCodecs and FFmpeg, rendering pipeline, and challenges of browser implementation.

Audio ProcessingWebAssemblyWebCodecs

0 likes · 10 min read

How Alibaba Built a Web‑Based Short Video Editor: Front‑End Insights

ByteFE

Feb 21, 2022 · Frontend Development

ReolAudio: A Frontend‑Focused Audio Processing Library for Efficient Long‑Audio Editing

ReolAudio is a lightweight, JavaScript‑based library that replaces memory‑heavy AudioBuffer editing with streaming and random‑access decoding, frame‑based data structures, and a high‑performance AudioWorklet player, dramatically improving memory usage, start‑up time, and waveform rendering for long audio projects.

Audio Processingframe based editingstreaming decoding

0 likes · 33 min read

ReolAudio: A Frontend‑Focused Audio Processing Library for Efficient Long‑Audio Editing

DataFunSummit

Jan 16, 2022 · Artificial Intelligence

Multimodal Text and Speech Emotion Analysis: Overview, MSCNN‑SPU Model, and Domain Adaptation

This talk presents an overview of text‑plus‑speech multimodal emotion analysis, covering background, single‑modal text and audio models, the MSCNN‑SPU multimodal architecture, domain‑adaptation techniques, and future directions, with detailed model explanations, experimental results, and practical deployment insights.

Audio ProcessingText Classificationdeep learning

0 likes · 40 min read

Multimodal Text and Speech Emotion Analysis: Overview, MSCNN‑SPU Model, and Domain Adaptation

DataFunTalk

Dec 14, 2021 · Artificial Intelligence

Speech Translation: Enterprise Applications and Research

This article presents an overview of speech translation, discusses its motivations and applications at ByteDance, compares cascade and end‑to‑end modeling approaches, introduces advanced encoder and decoder designs such as LUT, Chimera, and COSTT, outlines progressive multi‑task training and data‑augmentation strategies, and shares experimental results and Q&A.

AIAudio Processingend-to-end models

0 likes · 16 min read

Speech Translation: Enterprise Applications and Research

Douyu Streaming

Dec 1, 2021 · Mobile Development

How to Get, Build, and Extend WebRTC m79 Source for Windows, Android, and iOS

This guide explains how to obtain the WebRTC m79 source, compile it for Windows, Android, and iOS, walk through the basic signaling and peer‑connection workflow, and implement advanced video‑capture and audio‑volume features with custom C++ extensions, while unifying the codebase across platforms.

Audio ProcessingC#Compilation

0 likes · 19 min read

How to Get, Build, and Extend WebRTC m79 Source for Windows, Android, and iOS

NetEase Smart Enterprise Tech+

Nov 1, 2021 · Artificial Intelligence

How AI is Transforming Real-Time Audio Communication: Challenges and Solutions

This article explores the evolution of AI audio algorithms in real‑time communication, detailing current trends, technical hurdles such as computational complexity and data scarcity, and practical solutions including lightweight models, data augmentation, and hybrid AI‑traditional pipelines, illustrated with real‑world NetEase Cloud IM case studies.

AIAudio ProcessingReal‑time communication

0 likes · 18 min read

How AI is Transforming Real-Time Audio Communication: Challenges and Solutions

High Availability Architecture

Oct 21, 2021 · Cloud Computing

Optimizing NetEase Cloud Music Audio/Video Processing Platform with Serverless

This article describes how NetEase Cloud Music leveraged Serverless function computing to redesign its audio/video algorithm processing platform, covering the existing challenges, the selection criteria for Serverless solutions, the implementation details, performance gains, cost savings, and future directions.

Audio ProcessingCloud FunctionsNetEase

0 likes · 11 min read

Optimizing NetEase Cloud Music Audio/Video Processing Platform with Serverless

Volcano Engine Developer Services

Oct 20, 2021 · Artificial Intelligence

How ByteDance’s AI Transforms Music Creation and Discovery on TikTok

ByteDance leverages advanced AI models such as SpectTNT, semi‑supervised music tagging transformers, language identification, chord recognition, contrastive representation learning, and source separation to power TikTok’s massive music library, enabling seamless music‑video interaction, smarter recommendations, and new creative tools for creators worldwide.

Audio Processingdeep learninglanguage identification

0 likes · 10 min read

How ByteDance’s AI Transforms Music Creation and Discovery on TikTok

ELab Team

Aug 18, 2021 · Frontend Development

Unlock Powerful Audio Effects with Web Audio API: A Hands‑On Guide

This article introduces the Web Audio API, covering AudioContext, audio nodes, routing graphs, various source types, processing nodes like AnalyserNode and BiquadFilterNode, as well as spatialization with PannerNode and convolution reverb, providing code examples and practical demos for frontend developers.

Audio ProcessingAudioContextJavaScript

0 likes · 16 min read

Unlock Powerful Audio Effects with Web Audio API: A Hands‑On Guide

iQIYI Technical Product Team

Jun 11, 2021 · Artificial Intelligence

iQIYI M2VoC Multi‑Speaker Multi‑Style Voice Cloning Challenge at ICASSP 2021 – Overview and Results

The iQIYI M2VoC competition at ICASSP 2021, the first low‑resource multi‑speaker, multi‑style voice‑cloning challenge, attracted 153 academic and industry teams to tackle few‑shot (100 utterances) and extreme few‑shot (5 utterances) tracks, evaluated by professional listeners, yielding strong innovations and applications while confirming that single‑sample cloning remains unsolved.

AIAudio ProcessingICASSP2021

0 likes · 7 min read

iQIYI M2VoC Multi‑Speaker Multi‑Style Voice Cloning Challenge at ICASSP 2021 – Overview and Results

TAL Education Technology

Jun 3, 2021 · Frontend Development

Exploring Web Live Streaming: From Web 1.0 to 2.1 – Architecture, AI Integration, and Refactoring

This article chronicles the evolution of a web‑based live‑streaming platform from its initial Web 1.0 prototype through successive versions, detailing AI speech detection, RTC integration, extensive refactoring, and the resulting framework that proves web technologies can effectively support live‑streaming scenarios.

AI integrationAudio ProcessingJavaScript

0 likes · 6 min read

Exploring Web Live Streaming: From Web 1.0 to 2.1 – Architecture, AI Integration, and Refactoring

Kuaishou Tech

May 17, 2021 · Industry Insights

How Kuaishou Delivered Real‑Time Deep‑Learning Voice Conversion on PC

Kuaishou becomes the first company to deploy a deep‑learning‑based real‑time voice‑conversion system on PC clients, delivering stable, natural‑sounding transformed speech with sub‑200 ms latency, and the article analyzes industry methods, technical challenges, and the four‑module architecture that made it possible.

Audio ProcessingIndustry insightKuaishou

0 likes · 10 min read

How Kuaishou Delivered Real‑Time Deep‑Learning Voice Conversion on PC

New Oriental Technology

May 17, 2021 · Fundamentals

Live Streaming Process Model: Capture, Sampling, Encoding, and Audio Channel Technologies

This article explains the live streaming workflow, detailing audio and video capture, digital sampling rates and bit depths, various sound channel configurations from mono to immersive formats, and common audio encoding methods such as PCM, AAC, MP3, and FLAC.

Audio ProcessingLive Streamingaudio encoding

0 likes · 22 min read

Live Streaming Process Model: Capture, Sampling, Encoding, and Audio Channel Technologies

Python Crawling & Data Mining

May 1, 2021 · Fundamentals

Master Audio Editing in Python with pydub: From Basics to Advanced Techniques

This tutorial introduces Python's pydub library for audio manipulation, covering installation, AudioSegment basics, slicing, merging, fading, speed adjustment, playback, gain control, cross‑fade, multichannel creation, and exporting, with a practical example for short‑video post‑production.

Audio ProcessingAudioSegmentPython

0 likes · 11 min read

Master Audio Editing in Python with pydub: From Basics to Advanced Techniques

Programmer DD

Jan 22, 2021 · Artificial Intelligence

How to Build a Raspberry Pi Baby Cry Detector with TensorFlow and Open‑Source Tools

This guide shows how to turn a Raspberry Pi into an automated baby monitor that records audio, trains a TensorFlow sound‑detection model, generates labeled datasets, runs real‑time inference, and sends push notifications via Platypush, while also integrating a camera and audio streaming.

Audio ProcessingAutomationRaspberry Pi

0 likes · 25 min read

How to Build a Raspberry Pi Baby Cry Detector with TensorFlow and Open‑Source Tools

NetEase Smart Enterprise Tech+

Nov 2, 2020 · Artificial Intelligence

How AI Is Transforming Real‑Time Audio: Insights from LiveVideoStackCon 2020

The LiveVideoStackCon 2020 conference highlighted AI's growing role in audio processing, detailing Dr. Hao Yiya's modular AI approach, current challenges like computational load and robustness, and NetEase Cloud's data‑driven advancements that are reshaping real‑time communication audio.

AIAudio ProcessingModular AI

0 likes · 5 min read

How AI Is Transforming Real‑Time Audio: Insights from LiveVideoStackCon 2020

Python Programming Learning Circle

Apr 22, 2020 · Artificial Intelligence

Python Audio‑Based Parkinson’s Disease Detection Using Machine Learning

This tutorial demonstrates how to build a Python library that extracts acoustic measurements from healthy and Parkinson’s disease audio recordings, constructs a machine‑learning dataset, trains a logistic‑regression classifier with scikit‑learn, evaluates its accuracy, and provides functions to load and use the trained model in other applications.

Audio ProcessingParkinson's DiseaseParselmouth

0 likes · 12 min read

Python Audio‑Based Parkinson’s Disease Detection Using Machine Learning

Tencent Cloud Developer

Mar 19, 2020 · Artificial Intelligence

Real-Time Voice Communication Technologies and AI Enhancements in Tencent Meeting

Shang Shidong outlines Tencent Meeting’s shift from analog PSTN to IP‑based VoIP, using H.323, SIP, RTP/UDP and the Opus codec, while AI‑driven super‑resolution, deep‑learning packet‑loss concealment, advanced noise reduction, and speech‑music classification boost audio quality, complemented by reference‑free MOS assessment and future 5G‑enabled cloud, IoT and WebRTC integration.

AIAudio ProcessingRTP

0 likes · 30 min read

Real-Time Voice Communication Technologies and AI Enhancements in Tencent Meeting

iQIYI Technical Product Team

Jan 9, 2020 · Artificial Intelligence

Voice Conversion (VC): Fundamentals, Progress, and Applications

Voice conversion (VC) technology changes a speaker’s timbre and style while keeping the spoken text unchanged, supporting one‑to‑one, many‑to‑one, and many‑to‑many scenarios for medical assistance and entertainment, using parallel or non‑parallel data through methods such as DTW‑aligned frame mapping, attention‑based neural networks, PPG‑LSTM pipelines, VAEs, normalizing‑flow models, and GANs, with iQIYI focusing on non‑parallel data, prosody preservation, and noise‑robust augmentation.

Audio ProcessingGaNVAE

0 likes · 12 min read

Voice Conversion (VC): Fundamentals, Progress, and Applications

Tencent Cloud Developer

Jul 11, 2019 · Industry Insights

How Real-Time Audio/Video Meets Traditional PSTN: Architecture and Low‑Latency Solutions

This article provides an in‑depth technical analysis of integrating real‑time audio/video (RTC) with legacy PSTN, covering latency sources, protocol and codec differences, adaptation layers, system architecture, and optimization techniques such as jitter buffering, ARQ/FEC, and automatic failover.

Audio ProcessingPSTN integrationRTC

0 likes · 17 min read

How Real-Time Audio/Video Meets Traditional PSTN: Architecture and Low‑Latency Solutions

DataFunTalk

May 15, 2019 · Artificial Intelligence

AI‑Driven Audio Content Understanding and Safety for Live Streams

Using AI to automatically understand and secure audio content, this article discusses the challenges of manual audio analysis, outlines a four‑step pipeline—audio segmentation, speech‑to‑text, labeling, and synthesis—and describes models such as VAD, ASR, sound classification, text recognition, and behavior detection for live‑stream moderation.

AIAudio ProcessingContent Safety

0 likes · 11 min read

AI‑Driven Audio Content Understanding and Safety for Live Streams

MaGe Linux Operations

Apr 20, 2019 · Artificial Intelligence

Master Python Speech Recognition: Install, Record, and Transcribe Audio

This comprehensive guide walks you through the fundamentals of speech recognition, explains how it works, compares Python packages, shows step‑by‑step installation of SpeechRecognition, demonstrates processing audio files and live microphone input, and offers techniques for handling noise and multilingual transcription.

Audio ProcessingPythonSpeechRecognition

0 likes · 16 min read

Master Python Speech Recognition: Install, Record, and Transcribe Audio

58 Tech

Mar 5, 2019 · Mobile Development

Noise Reduction in Live Streaming: Comparative Study and Integration of WebRTC and RNNoise on Android

This article examines the challenges of live‑stream audio noise, compares open‑source denoising solutions such as Speex, WebRTC, and RNNoise, and details the practical integration, performance testing, and final adoption of WebRTC‑based noise reduction within the 58 live‑stream Android SDK.

AndroidAudio ProcessingLive Streaming

0 likes · 14 min read

Noise Reduction in Live Streaming: Comparative Study and Integration of WebRTC and RNNoise on Android

Tencent Cloud Developer

Oct 10, 2018 · Artificial Intelligence

What Are the Real Challenges and Future Trends in Intelligent Voice Technology?

This article examines the current landscape of intelligent voice technology—including speech recognition, synthesis, voiceprint identification, and acoustic event detection—highlighting technical hurdles, evaluation metrics, recent advances such as WaveNet, and a wide range of practical applications from mobile devices to smart hardware and enterprise solutions.

Audio ProcessingSpeech synthesisTencent Cloud

0 likes · 16 min read

What Are the Real Challenges and Future Trends in Intelligent Voice Technology?

iQIYI Technical Product Team

Sep 14, 2018 · Artificial Intelligence

AI RAP: End-to-End Speech Synthesis for Rap Generation Using Location‑Sensitive Attention and Inference Mask

AI RAP is an end‑to‑end AI service that lets users generate personalized rap with a single click by combining location‑sensitive attention and an inference mask to achieve perfect alignment, beat‑synchronous timing, multi‑character voice timbres, sub‑second synthesis, and a scalable architecture supporting millions of daily users.

AIAttention MechanismAudio Processing

0 likes · 5 min read

AI RAP: End-to-End Speech Synthesis for Rap Generation Using Location‑Sensitive Attention and Inference Mask

MaGe Linux Operations

May 30, 2018 · Artificial Intelligence

Master Python Speech Recognition: From Basics to Real-World Audio Transcription

This comprehensive guide walks you through the fundamentals of speech recognition, explains how Python’s SpeechRecognition library works, shows how to install and use various recognizer packages, process audio files and microphone input, handle noise, and troubleshoot common errors with clear code examples.

Audio ProcessingSpeechRecognitionVoice Transcription

0 likes · 18 min read

Master Python Speech Recognition: From Basics to Real-World Audio Transcription

Xianyu Technology

Apr 20, 2018 · Artificial Intelligence

Client‑Side Voice Recognition with TensorFlow Lite and MFCC Optimization

The paper presents a client‑side speech recognizer that uses a compact TensorFlow Lite Inception‑v3 CNN model combined with an optimized MFCC feature pipeline and ARM‑NEON‑accelerated, multi‑threaded processing, achieving low‑latency, high‑accuracy voice recognition on mobile and embedded devices.

Audio ProcessingMFCCTensorFlow Lite

0 likes · 14 min read

Client‑Side Voice Recognition with TensorFlow Lite and MFCC Optimization

Tencent Music Tech Team

May 5, 2017 · Mobile Development

Understanding iOS Core Audio: Definitions of Sample, Frame, and Packet

The article clarifies Apple’s Core Audio terminology—defining a sample as a single channel value, a frame as simultaneous samples, and a packet as one or more contiguous frames—explains why these terms are often confused across audio, networking, and codec contexts, and demonstrates the definitions with an MP3 parsing example.

Audio ProcessingCore Audioframe

0 likes · 11 min read

Understanding iOS Core Audio: Definitions of Sample, Frame, and Packet