MeanCache Sets New Multi‑Modal Generation Inference Speed Benchmark at ICLR 2026
MeanCache introduces an average‑velocity‑driven caching framework that uses Jacobian‑vector‑product correction and a multigraph‑based scheduling algorithm to achieve over 4× speedup on state‑of‑the‑art multimodal diffusion models while preserving image fidelity and semantic consistency.
Industrial‑scale multimodal generation models such as FLUX and Qwen‑Image suffer from slow inference, and traditional feature‑caching methods often cause trajectory drift due to abrupt instantaneous‑speed fluctuations.
Building on the earlier LeMiCa work, the Unicom AI research team and Nanjing University propose MeanCache , a lightweight, training‑free flow‑matching acceleration framework. The key innovation is shifting the caching perspective from instantaneous speed to average velocity . MeanCache captures Jacobian‑vector‑product (JVP) information from the previous timestep and uses a derived anchor identity to precisely correct the current instantaneous speed, thereby stabilizing the generation trajectory.
The framework models the inference process as a multigraph where each timestep is a node and the bias between predicted average velocity and ground truth defines edge weights. A Peak‑Suppressed Shortest Path algorithm computes the optimal caching policy under a given compute budget, effectively determining when to cache.
Experimental results show that MeanCache delivers up to 4× acceleration on commercial‑grade text‑to‑image models Qwen‑Image and FLUX.1 [dev] while achieving state‑of‑the‑art scores on Image Reward and perception metrics. On the video generation model HunyuanVideo, it achieves a 3.6× speedup with improved quality. Qualitative analysis indicates better content consistency as acceleration increases, and the method shows stronger semantic robustness on rare‑word prompts such as “Peristeronic”.
The paper, code, and project page are all open‑source (arXiv:2601.19961, https://github.com/UnicomAI/MeanCache). MeanCache has also been endorsed by the Z‑Image and Qwen‑Image‑2512 teams and integrated into the ComfyUI ecosystem.
In summary, MeanCache provides a novel average‑velocity caching paradigm and a stable scheduling strategy that significantly speeds up diffusion‑based multimodal generation without sacrificing fidelity, offering a practical path toward reducing compute costs for industrial applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
