How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

EvoSearch, a test‑time evolutionary search method, dramatically improves image and video generation by increasing inference compute without extra training, outperforming existing scaling techniques on diffusion and flow models while maintaining robustness and diversity across multiple benchmarks.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
How EvoSearch Supercharges Image and Video Generation with Test‑Time Evolutionary Search

What Is Test‑Time Scaling in Vision?

Test‑time scaling (TTS) increases the computational budget at inference to improve the performance of large models, and similar ideas have been applied to visual generation. Researchers from Hong Kong University of Science and Technology and Kuaishou’s Keling team propose Evolutionary Search (EvoSearch), a TTS method that boosts image and video generation without any additional training or gradient updates.

Core Idea

Given a pretrained diffusion or flow model and a reward function representing human preference, the goal is to fit a target distribution p*(x) ∝ p0(x)·exp(R(x)/τ) while keeping the KL divergence to the original model small. Direct sampling from p* is infeasible because the state space of diffusion/flow models is high‑dimensional.

Limitations of Existing Test‑Time Methods

RL‑based post‑training requires additional data and heavy computation, and Best‑of‑N or particle‑sampling methods only explore a limited part of the state space, reducing diversity and missing modes.

EvoSearch: Evolutionary Search at Test Time

EvoSearch treats the denoising trajectory of diffusion/flow models as an evolutionary path. Each denoising step is a candidate that can mutate and evolve toward higher‑reward samples. Two mutation operators are introduced:

Initial‑noise mutation : orthogonal transformations keep the Gaussian distribution of the starting noise while adjusting its direction.

Mid‑trajectory mutation : inspired by stochastic differential equations, a perturbation proportional to the diffusion coefficient modifies intermediate states.

Evolution and population‑size schedules control when and how many candidates are evaluated, depending on the available test‑time compute.

Experimental Results

On Stable Diffusion 2.1 and Flux 1‑dev, EvoSearch shows strong scaling‑up behavior even when the test‑time compute is increased by four orders of magnitude. For video generation, EvoSearch achieves the highest reward gains on VBench, VBench2.0, and VideoGen‑Eval prompts. It also generalizes well to unseen evaluation metrics, demonstrating robustness and diversity.

Qualitative visualizations illustrate the improved sample quality and diversity across image and video generation tasks.

For more details, see the paper and project website.

Video GenerationImage GenerationDiffusion ModelsAI researchevolutionary searchtest-time scaling
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.