How Apple’s AI‑Powered PICO Codec Cuts Image Files to One‑Third While Preserving Quality
Apple’s new PICO perceptual image codec, detailed in the “What Matters in Practical Learned Image Compression” paper, combines a one‑shot context model, TextFidelityLoss, and TilingArtifactLoss to achieve up to 70%‑80% smaller files than AV1, VVC, JPEG AI, and other learned codecs while running in real‑time on an iPhone 17 Pro Max, though it still lags on traditional metrics like PSNR.
In February 2025 the JPEG committee released JPEG AI, the first end‑to‑end learned image coding standard, marking a shift from three decades of purely mathematically‑driven metrics such as PSNR toward AI‑based perception.
Apple’s research team responded with a paper titled What Matters in Practical Learned Image Compression (arXiv:2605.05148), introducing the PICO codec – Perceptual Image Codec – whose explicit goal is to satisfy human visual perception.
Three core challenges and their solutions:
Entropy coding speed: Traditional autoregressive entropy models are accurate but slow. PICO introduces a One‑shot Context Model that computes the scale parameters in a single forward pass, keeping autoregressive precision while enabling parallel computation. Removing this module drops performance by 10.28 %; adding it leaves speed essentially unchanged.
Perceptual training hallucinations: GAN‑based training can create realistic‑looking but fabricated details, especially in text where even minor distortions are noticeable. PICO adds TextFidelityLoss, which uses a pretrained text detector to locate text regions and enforces strict pixel‑level fidelity there, halving the absolute error on text.
Tiling artifacts: To run efficiently on mobile chips, images are split into 504×504 tiles. GAN training tends to ignore low‑frequency color, causing visible seams between tiles. PICO adds TilingArtifactLoss, a multi‑scale L1 loss that forces color consistency across frequencies, reducing tile‑boundary error by more than 50 %.
Experimental evaluation: A third‑party platform, Mabyduck, conducted a large‑scale human subjective test with 610 vetted participants (color‑blind and compression‑artifact screening) who performed blind pairwise comparisons, yielding 74,925 comparisons and Bayesian ELO scores. At equal perceived quality, PICO’s file size is only 30 %‑43 % of AV1, AV2, VVC, ECM, and JPEG AI, and 20 %‑40 % smaller than the strongest learned perceptual codecs such as HiFiC and MRIC. On an iPhone 17 Pro Max, PICO encodes a 12 MP photo in 230 ms and decodes in 150 ms, outperforming most top ML codecs that run on an NVIDIA V100 server. The authors note a “counter‑example”: on PSNR, PICO scores lower than DCVC‑RT and VVC, confirming the trade‑off between perceptual quality and traditional numeric metrics.
Limitations and significance: PICO is less effective on highly regular synthetic images (cartoons, diagrams) where rule‑based autoregressive models excel. Nonetheless, it represents the first systematic effort to tackle perceptual compression through architecture search, novel loss functions, massive human evaluation, and real‑time mobile deployment.
The paper’s corresponding author, Oren Rippel, previously led WaveOne’s real‑time adaptive image compression work and later contributed the ELF‑VC video codec, which achieved a 44 % bitrate reduction over H.264 on the UVG test set while running five times faster than comparable ML codecs. The same team now applies that expertise to Apple’s PICO codec.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
