How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search
EvoSearch, a test‑time evolutionary search method, dramatically improves image and video generation by increasing inference compute without extra training, outperforming existing scaling techniques on diffusion and flow models while maintaining robustness and diversity across multiple benchmarks.
What Is Test‑Time Scaling in Vision?
Test‑time scaling (TTS) increases the computational budget at inference to improve the performance of large models, and similar ideas have been applied to visual generation. Researchers from Hong Kong University of Science and Technology and Kuaishou’s Keling team propose Evolutionary Search (EvoSearch), a TTS method that boosts image and video generation without any additional training or gradient updates.
Core Idea
Given a pretrained diffusion or flow model and a reward function representing human preference, the goal is to fit a target distribution p*(x) ∝ p0(x)·exp(R(x)/τ) while keeping the KL divergence to the original model small. Direct sampling from p* is infeasible because the state space of diffusion/flow models is high‑dimensional.
Limitations of Existing Test‑Time Methods
RL‑based post‑training requires additional data and heavy computation, and Best‑of‑N or particle‑sampling methods only explore a limited part of the state space, reducing diversity and missing modes.
EvoSearch: Evolutionary Search at Test Time
EvoSearch treats the denoising trajectory of diffusion/flow models as an evolutionary path. Each denoising step is a candidate that can mutate and evolve toward higher‑reward samples. Two mutation operators are introduced:
Initial‑noise mutation : orthogonal transformations keep the Gaussian distribution of the starting noise while adjusting its direction.
Mid‑trajectory mutation : inspired by stochastic differential equations, a perturbation proportional to the diffusion coefficient modifies intermediate states.
Evolution and population‑size schedules control when and how many candidates are evaluated, depending on the available test‑time compute.
Experimental Results
On Stable Diffusion 2.1 and Flux 1‑dev, EvoSearch shows strong scaling‑up behavior even when the test‑time compute is increased by four orders of magnitude. For video generation, EvoSearch achieves the highest reward gains on VBench, VBench2.0, and VideoGen‑Eval prompts. It also generalizes well to unseen evaluation metrics, demonstrating robustness and diversity.
Qualitative visualizations illustrate the improved sample quality and diversity across image and video generation tasks.
For more details, see the paper and project website.
Kuaishou Tech
Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
