Cutting Video Bitrate to 14.4 kbps: Inside Kuaishou’s AI‑Generated Compression

Kuaishou’s audio‑video team presents an AI‑driven compression algorithm and the KISC speech codec that achieve ultra‑low‑bitrate real‑time video and high‑quality voice transmission, enabling smooth RTC experiences even on weak networks while supporting creative features like view‑point adjustment and background replacement.

Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
Kuaishou Audio & Video Technology
Cutting Video Bitrate to 14.4 kbps: Inside Kuaishou’s AI‑Generated Compression

AI‑Generated Compression Algorithm

In real‑time communication scenarios such as voice calls, online meetings, and live streaming, low latency and high quality are essential. Kuaishou’s audio‑video team developed an AI‑generated compression algorithm that delivers ultra‑low‑bitrate video and audio, making real‑time interactions smoother on weak networks.

Pre‑processing and Transmission

The algorithm extracts reference frame features and 3‑D keypoints, then at the sender extracts current frame keypoints. The receiver computes optical flow from reference and current keypoints, warps the reference features, and reconstructs the current frame, transmitting only keypoints.

Advantages and Creative Applications

This approach breaks traditional bitrate limits, achieving high performance at extremely low bitrates (e.g., 14.4 kbps for 15 fps video). The extracted model features also enable creative uses such as viewpoint adjustment, scene replacement, and face‑swap effects.

AI Speech Codec KISC

For audio, Kuaishou introduced the KISC (Kuaishou Intelligent Speech Codec), a deep‑learning‑based low‑bitrate high‑quality speech codec. It achieves high‑quality voice at 6 kbps, outperforming the widely used Opus codec at the same bitrate.

Speech Coding Technology

Traditional codecs (waveform, parametric, hybrid) extract features like LPC or MDCT coefficients, but struggle at very low bitrates. AI‑based codecs use neural networks to extract minimal yet expressive features, enabling low‑bitrate high‑quality reconstruction.

Evaluation Results

Subjective MUSHRA tests show KISC at 6 kbps scores close to Opus at 20 kbps and far exceeds Opus at 6 kbps, confirming superior audio quality at ultra‑low bitrate.

Engineering Implementation

The team optimized the model, reducing computational load by 75% and replacing 3‑D convolutions with 2‑D equivalents. Custom layers were implemented as Metal kernels and integrated into CoreML, leveraging GPU acceleration to meet real‑time requirements on macOS.

Conclusion

AI‑driven compression and speech coding dramatically lower bitrate requirements while preserving quality, enabling robust RTC experiences in diverse scenarios such as live‑streaming PK, online meetings, and other weak‑network environments.

Real-time communicationAI compressionlow bitrate videospeech codec
Kuaishou Audio & Video Technology
Written by

Kuaishou Audio & Video Technology

Explore the stories behind Kuaishou's audio and video technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.