Cutting Video Bitrate to 14.4 kbps: Inside Kuaishou’s AI‑Generated Compression
Kuaishou’s audio‑video team presents an AI‑driven compression algorithm and the KISC speech codec that achieve ultra‑low‑bitrate real‑time video and high‑quality voice transmission, enabling smooth RTC experiences even on weak networks while supporting creative features like view‑point adjustment and background replacement.
AI‑Generated Compression Algorithm
In real‑time communication scenarios such as voice calls, online meetings, and live streaming, low latency and high quality are essential. Kuaishou’s audio‑video team developed an AI‑generated compression algorithm that delivers ultra‑low‑bitrate video and audio, making real‑time interactions smoother on weak networks.
Pre‑processing and Transmission
The algorithm extracts reference frame features and 3‑D keypoints, then at the sender extracts current frame keypoints. The receiver computes optical flow from reference and current keypoints, warps the reference features, and reconstructs the current frame, transmitting only keypoints.
Advantages and Creative Applications
This approach breaks traditional bitrate limits, achieving high performance at extremely low bitrates (e.g., 14.4 kbps for 15 fps video). The extracted model features also enable creative uses such as viewpoint adjustment, scene replacement, and face‑swap effects.
AI Speech Codec KISC
For audio, Kuaishou introduced the KISC (Kuaishou Intelligent Speech Codec), a deep‑learning‑based low‑bitrate high‑quality speech codec. It achieves high‑quality voice at 6 kbps, outperforming the widely used Opus codec at the same bitrate.
Speech Coding Technology
Traditional codecs (waveform, parametric, hybrid) extract features like LPC or MDCT coefficients, but struggle at very low bitrates. AI‑based codecs use neural networks to extract minimal yet expressive features, enabling low‑bitrate high‑quality reconstruction.
Evaluation Results
Subjective MUSHRA tests show KISC at 6 kbps scores close to Opus at 20 kbps and far exceeds Opus at 6 kbps, confirming superior audio quality at ultra‑low bitrate.
Engineering Implementation
The team optimized the model, reducing computational load by 75% and replacing 3‑D convolutions with 2‑D equivalents. Custom layers were implemented as Metal kernels and integrated into CoreML, leveraging GPU acceleration to meet real‑time requirements on macOS.
Conclusion
AI‑driven compression and speech coding dramatically lower bitrate requirements while preserving quality, enabling robust RTC experiences in diverse scenarios such as live‑streaming PK, online meetings, and other weak‑network environments.
Kuaishou Audio & Video Technology
Explore the stories behind Kuaishou's audio and video technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
