Artificial Intelligence 21 min read

Innovative Features and Technical Implementation of Huaisen K‑Song Community: Recording, Editing, and Smart Pitch Correction

This article details how Huaisen reshapes the karaoke workflow by introducing innovative features such as clear‑singing pitch‑finding, a comprehensive editing SDK, and intelligent pitch‑correction algorithms, explaining the underlying audio analysis, strategy generation, and system architecture that enhance user experience across recording, editing, and publishing stages.

Kuaishou Tech

May 31, 2024

Innovative Features and Technical Implementation of Huaisen K‑Song Community: Recording, Editing, and Smart Pitch Correction

The piece continues a series examining the challenges faced by the Huaisen karaoke community and the unique strategies employed, focusing on the "music bullet screen" concept and, in particular, the innovative "clear‑singing pitch‑finding" feature that helps users determine the correct key without background music.

During the recording stage, Huaisen provides numerous feedback tools—lyric coloring, pitch‑indicator lines, accompaniment toggling, pause and segment re‑sing—while performing deep audio analysis of volume, SNR, fundamental frequency, pitch range, and ASR. The core "clear‑singing pitch‑finding" function uses the PYin algorithm to extract pitch in real time and offers immediate correction feedback.

The editing stage leverages a powerful editing SDK that unifies audio and video processing. It defines a Project structure with video and audio tracks, where each track’s clipRange and displayRange are configured to splice and position segments, enabling users to edit recordings precisely, apply effects, and align audio with accompaniment.

Smart pitch correction is broken down into three phases: (1) voice analysis using PYin’s HMM to clean pitch errors and extract ASR features; (2) strategy generation that includes octave detection, decorative‑note handling, and a recommended‑pitch algorithm to minimize the number of adjusted notes and avoid bad‑case artifacts; (3) voice processing with the P‑SOLA algorithm to apply the computed adjustments, after which the processed audio is integrated back into the editing SDK for preview and fine‑tuning.

In the publishing stage, Huaisen reduces perceived latency by employing asynchronous upload and the fragment‑mp4 technique, which streams video fragments to the server as soon as they are encoded, coupled with the ktp protocol for higher upload success rates.

Overall, the article demonstrates how recombining existing technologies—audio analysis, dynamic‑time‑warping, SDK‑based editing, and smart pitch‑correction—creates a seamless karaoke experience that boosts user engagement, retention, and the platform’s competitive edge.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI Software engineering Mobile App Audio Processing karaoke pitch correction

Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.