Tag

Audio Processing

0 views collected around this technical thread.

System Architect Go
System Architect Go
Nov 28, 2024 · Artificial Intelligence

An Overview of Modern AI Audio Technologies: ASR, TTS, and Voice Cloning

This article explains how modern AI advances have transformed audio processing, covering digital audio fundamentals, automatic speech recognition (ASR), text‑to‑speech (TTS), voice cloning techniques, and provides practical Python code examples using OpenAI Whisper and HuggingFace TTS models.

AIAudio ProcessingSpeech Recognition
0 likes · 7 min read
An Overview of Modern AI Audio Technologies: ASR, TTS, and Voice Cloning
Rare Earth Juejin Tech Community
Rare Earth Juejin Tech Community
Oct 18, 2024 · Fundamentals

Fundamentals of Audio and Video Processing, Compression, and Streaming Protocols

This article provides a comprehensive overview of audio and video fundamentals, including signal conversion, PCM encoding, compression techniques, spatial audio concepts, video encoding standards such as H.264/H.265, streaming protocols, bitrate control, and practical optimization algorithms for both audio and video pipelines.

Audio ProcessingVideo Encodingcompression
0 likes · 49 min read
Fundamentals of Audio and Video Processing, Compression, and Streaming Protocols
Kuaishou Tech
Kuaishou Tech
May 31, 2024 · Artificial Intelligence

Innovative Features and Technical Implementation of Huaisen K‑Song Community: Recording, Editing, and Smart Pitch Correction

This article details how Huaisen reshapes the karaoke workflow by introducing innovative features such as clear‑singing pitch‑finding, a comprehensive editing SDK, and intelligent pitch‑correction algorithms, explaining the underlying audio analysis, strategy generation, and system architecture that enhance user experience across recording, editing, and publishing stages.

AIAudio Processingkaraoke
0 likes · 21 min read
Innovative Features and Technical Implementation of Huaisen K‑Song Community: Recording, Editing, and Smart Pitch Correction
Kuaishou Tech
Kuaishou Tech
May 22, 2024 · Mobile Development

Technical Deep Dive into the Music Bullet (弹幕) System in a K‑Song Community

This article provides a comprehensive technical analysis of the music bullet feature in a K‑song community, detailing its core roles, client‑side production and consumption pipelines, real‑time mixing, alignment, volume balancing, precise seeking, performance optimizations, scalability, and sharing mechanisms across iOS and Android platforms.

AndroidAudio ProcessingiOS
0 likes · 18 min read
Technical Deep Dive into the Music Bullet (弹幕) System in a K‑Song Community
Kuaishou Tech
Kuaishou Tech
May 14, 2024 · Product Management

Innovating Offline K‑Song Communities: Huaisen’s Journey, Challenges, and Feature Breakthroughs

This article examines Huaisen, a Kuaishou‑incubated karaoke and music video app, analyzing the K‑song market’s size and challenges from short‑video platforms, and detailing Huaisen’s innovative features such as music bullet comments and a reshaped karaoke workflow to revitalize the offline K‑song community.

Audio ProcessingK‑SongMusic Community
0 likes · 19 min read
Innovating Offline K‑Song Communities: Huaisen’s Journey, Challenges, and Feature Breakthroughs
Test Development Learning Exchange
Test Development Learning Exchange
Mar 28, 2024 · Artificial Intelligence

Introduction to librosa: Audio Loading, Feature Extraction, and Visualization with Python

This article introduces the Python library librosa, outlines its main audio processing features such as loading, visualization, MFCC, pitch detection, chromagram, and rhythm analysis, and provides complete code examples for each operation.

Audio ProcessingFeature ExtractionMIR
0 likes · 7 min read
Introduction to librosa: Audio Loading, Feature Extraction, and Visualization with Python
Test Development Learning Exchange
Test Development Learning Exchange
Mar 28, 2024 · Fundamentals

Introduction to pydub for Audio Processing

pydub is a Python library for audio processing that enables editing, converting, and manipulating audio files through integration with ffmpeg, supporting formats like MP3 and WAV.

Audio ProcessingFFmpegPython Library
0 likes · 4 min read
Introduction to pydub for Audio Processing
DataFunSummit
DataFunSummit
Feb 4, 2024 · Mobile Development

Advanced Mobile Audio Recording Techniques in Quanjian K‑Song: Low Latency, High Fidelity, and Intelligent Audio Processing

The article details how Quanjian K‑Song has built a comprehensive mobile‑focused audio recording system since 2014, covering low‑latency capture, high‑quality sampling, lyric and vocal‑accompaniment alignment, ear‑return, pitch shifting, vocal enhancement, 3A processing, and AI‑driven scoring to deliver a professional karaoke experience on smartphones.

AI scoringAudio Processingkaraoke technology
0 likes · 14 min read
Advanced Mobile Audio Recording Techniques in Quanjian K‑Song: Low Latency, High Fidelity, and Intelligent Audio Processing
Tencent Music Tech Team
Tencent Music Tech Team
Feb 4, 2024 · Mobile Development

Technical Guidelines for High-Quality Mobile Recording and Audio Processing in Quanmin K Song

Quanmin K Song’s decade‑long mobile‑recording platform combines 48 kHz/16‑bit dry‑signal capture, sub‑70 ms latency via OpenSL ES/AAudio, real‑time clipping and noise detection, lyric‑ and vocal‑accompaniment alignment, pitch‑shifting, adaptive vocal enhancement, 3A DSP/AI processing, and AI‑driven pitch correction to deliver industry‑leading high‑quality mobile singing experiences.

AIAudio ProcessingMusic App
0 likes · 15 min read
Technical Guidelines for High-Quality Mobile Recording and Audio Processing in Quanmin K Song
Kuaishou Tech
Kuaishou Tech
Dec 28, 2023 · Artificial Intelligence

Kuaishou Audio Team Wins ICASSP 2024 SSI and PLC Challenges with Advanced Speech Enhancement and Packet Loss Concealment

The Kuaishou audio team secured first place in both the ICASSP 2024 Speech Signal Improvement and Audio Deep Packet Loss Concealment challenges by deploying a two‑stage GAN‑based speech enhancement system and a hybrid time‑frequency packet‑loss concealment model that dramatically improve real‑time communication quality.

Audio ProcessingGANICASSP 2024
0 likes · 8 min read
Kuaishou Audio Team Wins ICASSP 2024 SSI and PLC Challenges with Advanced Speech Enhancement and Packet Loss Concealment
Ximalaya Technology Team
Ximalaya Technology Team
Nov 17, 2023 · Cloud Computing

Technical Case Study of Cloud Audio Editing: Challenges, Solutions, and Optimization

The case study details how the Cloud Editing team tackled severe waveform loading delays, zoom lag, and inefficient IndexedDB storage by refactoring the processing pipeline, standardizing multi‑transaction storage, adding monitoring and cleanup tools, and rigorously testing releases, ultimately cutting processing times by over half and dramatically improving user experience.

Audio ProcessingIndexedDBTechnical Case Study
0 likes · 9 min read
Technical Case Study of Cloud Audio Editing: Challenges, Solutions, and Optimization
OPPO Kernel Craftsman
OPPO Kernel Craftsman
Jul 21, 2023 · Frontend Development

Audio Architecture and Quality Optimization in WebRTC: Devices, 3A Processing, Codec, NetEQ and Scenario‑Based Solutions

The article explains WebRTC’s audio pipeline—from device capture through hardware or software 3A (AEC, ANS, AGC), Opus codec selection, and NetEQ jitter‑buffer handling—detailing how device specifics and scenario‑based configurations (live streaming, meetings, education, watch‑together) affect quality and why pure‑software 3A is emerging as the preferred future solution.

3AAudio ProcessingNetEQ
0 likes · 29 min read
Audio Architecture and Quality Optimization in WebRTC: Devices, 3A Processing, Codec, NetEQ and Scenario‑Based Solutions
Baidu Geek Talk
Baidu Geek Talk
Feb 15, 2023 · Artificial Intelligence

PaddlePaddle 2.4 Release: New Sparse, Graph, and Audio APIs

PaddlePaddle 2.4 introduces 167 new APIs—including sparse computing (paddle.sparse), graph learning (paddle.geometric), and audio processing (paddle.audio) modules—enabling efficient sparse model training and inference, graph message‑passing, advanced audio feature extraction, plus fresh loss functions, tensor utilities, and expanded vision transforms.

API ReleaseAudio ProcessingPaddlePaddle
0 likes · 16 min read
PaddlePaddle 2.4 Release: New Sparse, Graph, and Audio APIs
DataFunSummit
DataFunSummit
Aug 8, 2022 · Artificial Intelligence

Voice Analysis for Financial Risk Control: Feature Extraction, Single-Channel Speech Separation, and Text Tagging

This talk presents the application of voice analysis in financial risk control, covering voice‑based risk feature extraction, single‑channel speech separation techniques, and speech‑text labeling methods, demonstrating how acoustic and textual cues can be leveraged to improve risk detection and model performance.

Audio Processingmachine learningrisk control
0 likes · 12 min read
Voice Analysis for Financial Risk Control: Feature Extraction, Single-Channel Speech Separation, and Text Tagging
Python Programming Learning Circle
Python Programming Learning Circle
Apr 4, 2022 · Artificial Intelligence

Building a Simple Speech Synthesis System with iFlytek WebAPI in Python

This tutorial explains how to create a lightweight speech synthesis tool using iFlytek's WebAPI, covering required environment setup, API credential acquisition, GUI design with Tkinter, and detailed Python code for WebSocket communication, audio handling, and WAV file generation.

Audio ProcessingPythonSpeech Synthesis
0 likes · 8 min read
Building a Simple Speech Synthesis System with iFlytek WebAPI in Python
ByteFE
ByteFE
Feb 21, 2022 · Frontend Development

ReolAudio: A Frontend‑Focused Audio Processing Library for Efficient Long‑Audio Editing

ReolAudio is a lightweight, JavaScript‑based library that replaces memory‑heavy AudioBuffer editing with streaming and random‑access decoding, frame‑based data structures, and a high‑performance AudioWorklet player, dramatically improving memory usage, start‑up time, and waveform rendering for long audio projects.

Audio Processingframe based editingstreaming decoding
0 likes · 33 min read
ReolAudio: A Frontend‑Focused Audio Processing Library for Efficient Long‑Audio Editing
DataFunSummit
DataFunSummit
Jan 16, 2022 · Artificial Intelligence

Multimodal Text and Speech Emotion Analysis: Overview, MSCNN‑SPU Model, and Domain Adaptation

This talk presents an overview of text‑plus‑speech multimodal emotion analysis, covering background, single‑modal text and audio models, the MSCNN‑SPU multimodal architecture, domain‑adaptation techniques, and future directions, with detailed model explanations, experimental results, and practical deployment insights.

Audio ProcessingSpeech RecognitionText Classification
0 likes · 40 min read
Multimodal Text and Speech Emotion Analysis: Overview, MSCNN‑SPU Model, and Domain Adaptation
DataFunTalk
DataFunTalk
Dec 14, 2021 · Artificial Intelligence

Speech Translation: Enterprise Applications and Research

This article presents an overview of speech translation, discusses its motivations and applications at ByteDance, compares cascade and end‑to‑end modeling approaches, introduces advanced encoder and decoder designs such as LUT, Chimera, and COSTT, outlines progressive multi‑task training and data‑augmentation strategies, and shares experimental results and Q&A.

AIAudio Processingend-to-end models
0 likes · 16 min read
Speech Translation: Enterprise Applications and Research
High Availability Architecture
High Availability Architecture
Oct 21, 2021 · Cloud Computing

Optimizing NetEase Cloud Music Audio/Video Processing Platform with Serverless

This article describes how NetEase Cloud Music leveraged Serverless function computing to redesign its audio/video algorithm processing platform, covering the existing challenges, the selection criteria for Serverless solutions, the implementation details, performance gains, cost savings, and future directions.

Audio ProcessingCloud FunctionsNetEase
0 likes · 11 min read
Optimizing NetEase Cloud Music Audio/Video Processing Platform with Serverless
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 11, 2021 · Artificial Intelligence

iQIYI M2VoC Multi‑Speaker Multi‑Style Voice Cloning Challenge at ICASSP 2021 – Overview and Results

The iQIYI M2VoC competition at ICASSP 2021, the first low‑resource multi‑speaker, multi‑style voice‑cloning challenge, attracted 153 academic and industry teams to tackle few‑shot (100 utterances) and extreme few‑shot (5 utterances) tracks, evaluated by professional listeners, yielding strong innovations and applications while confirming that single‑sample cloning remains unsolved.

AIAudio ProcessingICASSP2021
0 likes · 7 min read
iQIYI M2VoC Multi‑Speaker Multi‑Style Voice Cloning Challenge at ICASSP 2021 – Overview and Results