Artificial Intelligence 11 min read

Visual Lossless Deep Learning Pre‑processing for Video Transcoding Using DCT‑Based Low‑Rank Loss and a Lightweight Model

A visual‑lossless deep‑learning pre‑processor that employs a DCT‑based low‑rank loss and an ultra‑lightweight CPU‑friendly model achieves up to 20% bitrate reduction for 1080p videos while preserving high‑frequency details, enabling real‑time transcoding and bandwidth savings for popular content on Bilibili.

Bilibili Tech
Bilibili Tech
Bilibili Tech
Visual Lossless Deep Learning Pre‑processing for Video Transcoding Using DCT‑Based Low‑Rank Loss and a Lightweight Model

Bilibili receives hundreds of thousands of video uploads daily, and popular videos consume a large portion of bandwidth. When a video becomes hot, Bilibili re‑encodes it with a higher‑complexity transcoding system that aims to keep visual quality while reducing redundancy and bitrate.

To further improve transcoding efficiency, a visual lossless deep‑learning pre‑processing system was developed. The system adds about 15% bitrate savings without degrading perceived quality.

2. DCT‑Based Low‑Rank Expression Loss

The traditional video coding pipeline includes transform, quantization, and entropy coding. Discrete Cosine Transform (DCT) converts spatial pixels to frequency coefficients, concentrating most energy in the low‑frequency region (top‑left of the coefficient matrix). By discarding high‑frequency coefficients, redundancy can be reduced, but naive discarding harms texture details.

A loss function was designed to encourage the network to suppress weak textures and noise in flat regions while preserving important high‑frequency details. The procedure is:

Apply DCT to an N×N image block (N=8,16,…).

Compute the mean of the high‑frequency coefficient range.

Calculate the mean absolute error between coefficients below the mean and zero, yielding the low‑rank loss.

Figures illustrate DCT coefficient distributions before and after discarding different percentages of high‑frequency components, showing that the proposed loss retains more salient high‑frequency information than simple truncation.

3. Lightweight Model Design

Because online transcoding demands low latency and limited compute, a compact model was built for x86 CPUs. The pipeline:

Pixel Unshuffle for down‑sampling.

3×3 convolution followed by a grouped‑convolution block.

Series of convolutional skip connections and Pixel Shuffle for up‑sampling.

Operators were chosen for CPU friendliness. The model contains only 0.0032 M parameters, 1.644 G FLOPs for 1080p input, and achieves real‑time throughput on a single‑threaded x86 platform.

4. Training Stage

Training data undergoes a two‑stage degradation process (randomly applied) that includes blur, scaling, noise, and compression, better mimicking real‑world degradations. The network is optimized jointly with L1 loss and the proposed low‑rank DCT loss, while the ground‑truth images are lightly denoised and sharpened.

5. Data and Results

Over ten thousand online videos were re‑encoded with the deep‑learning pre‑processor. Table 1 shows bitrate reduction and objective quality improvements across resolutions; at 1080p the bitrate saving approaches 20% while maintaining visual quality. Visual comparisons (Figure 9) demonstrate preserved or enhanced details such as text and foliage.

6. Conclusion

The visual lossless deep‑learning pre‑processing algorithm, based on a DCT low‑rank loss and an ultra‑lightweight model, effectively reduces redundant information, eases encoder bitrate allocation, and runs in real time on x86 CPUs. It improves user experience for popular videos while cutting bandwidth costs.

AIDeep Learninglightweight modelbandwidth optimizationvideo transcodingDCTlow-rank loss
Bilibili Tech
Written by

Bilibili Tech

Provides introductions and tutorials on Bilibili-related technologies.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.