How ResULIC Achieves Ultra‑Low‑Rate Image Compression with Semantic Residual Coding and Diffusion
The paper introduces ResULIC, a residual‑guided ultra‑low‑bitrate image compression framework that combines semantic residual coding, a compression‑aware diffusion model, and perceptual fidelity optimization to dramatically improve visual quality and outperform prior diffusion‑based methods on standard benchmarks.
Background
Learning‑based image compression has surpassed traditional codecs such as JPEG2000 and VVC in both objective and subjective metrics, but at extremely low bitrates it suffers from over‑smooth textures and loss of structural details. Recent diffusion models offer a promising alternative, yet existing methods still exhibit noticeable gaps in fidelity and consistency.
Method
The paper proposes ResULIC (Residual‑guided Ultra Low‑rate Image Compression), which consists of three core components:
Feature compressor : maps the image into a latent space.
Semantic Residual Coding : extracts semantic residuals by comparing the decoded image with the original, feeds both captions into a large language model to obtain concise semantic descriptions, and encodes these residuals as additional bits.
Compression‑aware Diffusion Model : conditions a diffusion process on the compressed latent representation and the semantic residual, aligning compression ratio with diffusion timesteps to achieve high‑fidelity reconstruction at ultra‑low bitrates.
Perceptual Fidelity Optimization further refines diffusion prompts using CLIP embeddings to reduce the fidelity gap.
Experiments
ResULIC is evaluated on the CLIC‑2020 dataset using PSNR, MS‑SSIM, LPIPS, DISTS, FID and KID. The method outperforms previous diffusion‑based approaches (e.g., PerCo) by 80.7 % in LPIPS and 66.3 % in FID, and achieves state‑of‑the‑art performance across all metrics. Ablation studies show that adaptive diffusion steps, which correlate with bitrate, further improve reconstruction quality.
Conclusion and Outlook
ResULIC demonstrates that integrating semantic residual coding with a compression‑aware diffusion model can dramatically improve visual quality at ultra‑low bitrates, providing a strong foundation for future video compression research at Kuaishou.
References
[1] Chen, T., Liu, H., Ma, Z., Shen, Q., Cao, X., and Wang, Y. End‑to‑end learnt image compression via non‑local attention optimization and improved context modeling. IEEE Transactions on Image Processing, 30:3179–3191, 2021.
[2] Lu, M., Guo, P., Shi, H., Cao, C., and Ma, Z. Transformer‑based image compression. In 2022 Data Compression Conference (DCC), pp. 469–469. IEEE, 2022.
[3] Duan, Z., Lu, M., Ma, J., Huang, Y., Ma, Z., and Zhu, F. Qarv: Quantization‑aware ResNet VAE for lossy image compression. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023.
[4] Rombach, R., Blattmann, A., Lorenz, D., Esser, P., and Ommer, B. High‑resolution image synthesis with latent diffusion models. In CVPR, 2022.
[5] Careil, M., Muckley, M. J., Verbeek, J., and Lathuilière, S. Towards image compression with perfect realism at ultra‑low bitrates. ICLR, 2024.
[6] Lei, E., Uslu, Y. B., Hassani, H., and Bidokhti, S. S. Text+sketch: Image compression at ultra low rates. ICML 2023 Workshop, 2023.
[7] Li, Z., Zhou, Y., Wei, H., Ge, C., and Jiang, J. Towards extreme image compression with latent feature guidance and diffusion prior. arXiv preprint arXiv:2404.18820, 2024.
[8] Radford, A., Kim, J. W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al. Learning transferable visual models from natural language supervision. In ICML, 2021.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
