How Tencent’s Neural Codec Dominated 2025 AI Compression Challenges
In December 2025, Tencent Shannon Lab’s neural codec TNC won both the VCIP low‑complexity end‑to‑end image compression contest and the PCS high‑compression intelligent compression challenge, showcasing superior quality at equal bitrates across image and video tracks and highlighting the lab’s AI‑driven advances in video‑image coding.
Competition Results
In December 2025 two AI‑driven image/video coding challenges released their final rankings. In the VCIP 2025 Low‑Complexity End‑to‑End Image Compression Practical Challenge (4K images, BPG QP28 bitrate reference, decoding MACs < 50 K per pixel) the Tencent Neural Codec (TNC) achieved the highest visual quality at the same bitrate, winning the championship. Across 20 teams TNC averaged 0.51 bpp and improved PSNR by 1.66 dB over BPG (maximum gain 2.81 dB), leading the runner‑up by 0.4 dB.
In the PCS 2025 High‑Compression Intelligent Compression Challenge the image track (2K images at 0.075, 0.15 and 0.30 bpp) and the video track (10‑second Full‑HD clips) used pure subjective MOS scoring. TNC’s image codec obtained the best MOS, while its video codec (named TCM) outperformed the H.266 reference model (VTM) by 1.2 MOS points and the exploratory H.266 model (ECM) by 0.57 points. Objectively, TCM‑OBJ achieved 3.07 dB higher PSNR than VTM and 2 dB higher than ECM at the same bitrate.
TNC Image Codec Architecture
TNC adopts a hybrid VAE‑INR design with a controllable frame‑level and block‑level rate‑distortion selector that chooses between a Variational Auto‑Encoder (VAE) branch and an Implicit Neural Representation (INR) branch for each block.
VAE Branch Optimizations
Asymmetric encoder‑decoder depth: the encoder uses a deeper MobileNet‑style network for richer feature extraction, while the decoder employs a shallow ShuffleNet‑style architecture with channel‑grouping and shuffling to keep decoding cost low.
Re‑parameterized 3×3 convolutions: weight‑fusion techniques increase expressive power without extra runtime.
WSiLU activation: replaces ReLU with a globally smooth, differentiable function, improving training stability and robustness to distribution shift.
Entropy Coding
A hyper‑prior network extracts side information from the latent feature map. On top of this, a dual autoregressive context model predicts probability parameters across both spatial and channel dimensions. The probability distribution is modeled as a generalized Gaussian:
p(x) = \frac{\beta}{2\alpha\Gamma(1/\beta)} \exp\left[-\left(\frac{|x-\mu|}{\alpha}\right)^{\beta}\right]where \mu is the mean, \alpha the scale, and \beta the shape parameter, allowing a more accurate fit to latent statistics and reducing symbol‑coding cost.
INR Branch
The input image is split into eight blocks. Each block is over‑fitted with a small INR network; if the INR reconstruction yields lower distortion than the VAE output, the learned latent variables and the network weights are encoded into the bitstream. A lottery‑ticket‑style mask selects a subset of weights, enabling very low‑complexity decoding.
Subjective Quality Optimization
The training pipeline uses a diverse multi‑category dataset (natural scenes, portraits, screen text, animation, AIGC). It proceeds in four stages:
Train a single‑rate model with L2 loss.
Fine‑tune to multiple rates by sampling Lagrange multipliers as conditioning signals.
Add LPIPS to the loss to preserve semantic details.
Freeze encoder and entropy model, then apply a subjective aggregation loss in an adversarial multi‑rate setting to boost perceptual quality.
TNC Video Codec Architecture
The video codec integrates AI‑enhanced pre‑processing (ESRGAN‑based up‑sampling) and a neural‑network loop filter (NNLF) to improve detail and subjective quality.
Complexity‑Adaptive Quantization
A scene‑detection module and a local‑complexity analyzer compute a block‑level adaptive QP using the CUTree algorithm. The QP formula adapts to motion intensity, noise level, and frame flatness, allowing the encoder to allocate bits efficiently across diverse content.
Frame‑Level Parallelism
NNLF is split into two stages: the first stage determines model parameters from a subset of rows, enabling dependent frames to start encoding earlier; the second stage processes the remaining rows with the derived parameters, improving parallel throughput.
Engineering Accelerations
AVX2 and GPU acceleration of the SADL library yields a 45 % speedup for the convolution‑heavy NNLF.
Adaptive block skipping based on texture complexity and QP reduces decoding time an additional 45 %.
Combined, these optimizations give a 2.11× overall decoding speed increase.
Future Outlook
While TNC demonstrates state‑of‑the‑art compression performance and low‑complexity decoding, challenges remain for commercial deployment, such as cross‑platform decoding consistency and endpoint resource constraints. Ongoing work focuses on further algorithmic innovations and engineering refinements to bridge the gap to industry adoption in short‑video, cloud gaming, and AIGC pipelines.
Tencent Architect
We share insights on storage, computing, networking and explore leading industry technologies together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
