Mobile Development 14 min read

Advanced Mobile Audio Recording Techniques in Quanjian K‑Song: Low Latency, High Fidelity, and Intelligent Audio Processing

The article details how Quanjian K‑Song has built a comprehensive mobile‑focused audio recording system since 2014, covering low‑latency capture, high‑quality sampling, lyric and vocal‑accompaniment alignment, ear‑return, pitch shifting, vocal enhancement, 3A processing, and AI‑driven scoring to deliver a professional karaoke experience on smartphones.

DataFunSummit
DataFunSummit
DataFunSummit
Advanced Mobile Audio Recording Techniques in Quanjian K‑Song: Low Latency, High Fidelity, and Intelligent Audio Processing

With the widespread adoption of mobile internet, many users now create and record music on their phones, but hardware and software constraints create challenges for high‑quality recording. Since 2014, Quanjian K‑Song has been deep‑diving into mobile audio technology, establishing a complete high‑quality recording system that addresses low latency, lyric‑accompaniment alignment, vocal‑accompaniment alignment, ear‑return, pitch shifting, vocal enhancement, 3A processing, and multi‑dimensional scoring.

For low‑latency, high‑fidelity capture the team recommends 48 kHz, 16‑bit, mono recording, converting dry vocals to dual‑channel and performing quality checks (silence, loudness, etc.). Android uses high‑performance OpenSL ES and AAudio APIs, optimizing sample rate, bit depth, and buffer size to achieve 30‑70 ms end‑to‑end latency, with plans to support 96 kHz in the future.

Bluetooth earphone recording is enhanced through deep hardware integration: on Huawei FreeBuds the ear‑return delay is kept under 40 ms, and custom protocols with partners such as Meizu, Vivo, and others allow real‑time control of ear‑return, volume, and other parameters.

During recording, real‑time sound detection (using MCRA noise‑estimation) monitors clipping, volume, and background noise, providing user guidance. Lyric alignment uses QRC‑format timestamps for precise sync, while vocal‑accompaniment alignment employs audio fingerprinting and offers a manual adjustment range of ±600 ms.

Pitch shifting analyzes harmonic content frame‑by‑frame, adjusting only harmonics while preserving transient phase to maintain the original feel of drums and other non‑pitched elements.

Ear‑return is implemented both in hardware (via HAL on various manufacturers' devices) and in software (OpenSL ES/AAudio on Android, AudioUnit on iOS). The software path can reach sub‑40 ms latency on Android and as low as 5 ms capture interval on iOS, resulting in overall ear‑return delays of around 17 ms.

The platform also provides a multi‑dimensional scoring system that evaluates pitch, rhythm, technique, and more, using both reference‑based algorithms and a neural‑network‑based non‑reference model for objective feedback.

3A processing (Automatic Gain Control, Noise Suppression, Echo Cancellation) is offered via traditional DSP pipelines and a proprietary neural‑network model trained on large‑scale real singing data, ensuring high‑quality output while suppressing echo and background noise.

Audio effects such as spatial rendering, EQ, filtering, delay, and convolution reverb are dynamically configured through server‑delivered presets, allowing flexible effect chains for each recording.

Vocal‑to‑accompaniment ratio is automatically adjusted using intelligent algorithms that keep vocal loudness constant while balancing the mix based on reference tracks.

Vocal enhancement combines adaptive filtering, multi‑band gain, and gender‑aware parameters to improve clarity and presence.

Post‑processing pitch correction and repair leverage MIR analysis and AI models to refine intonation, timing, and overall sound quality, producing a polished final track.

In summary, after nearly a decade of R&D, Quanjian K‑Song has built a complete, industry‑leading mobile recording technology stack that overcomes the unique challenges of mobile audio capture and provides users with a professional‑grade karaoke experience.

Audio Processinglow latencymobile audiospeech enhancementAI scoringkaraoke technology
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.