AdaGen: Enabling Adaptive, Data‑Driven Strategies for Image Generation Models
AdaGen replaces handcrafted static schedules in multi‑step image generators with a universal, learnable policy network trained via reinforcement learning, using an MDP formulation, adversarial rewards and action smoothing, achieving consistent quality and efficiency gains across diffusion, autoregressive, mask and flow models while adding negligible overhead.
Motivation: From Static Hand‑Crafted Schedules to Adaptive Policies
Current multi‑step image generation models—including diffusion (e.g., DiT), autoregressive (e.g., VAR), mask‑based (e.g., MaskGIT) and flow (e.g., SiT) models—share a common paradigm of decomposing generation into a series of controllable steps. This paradigm requires a large set of hyper‑parameters (noise level, sampling temperature, guidance scale, etc.) that are typically managed by static, manually designed scheduling rules. Two major drawbacks are identified: (1) the need for extensive expert knowledge and repeated tuning, and (2) a "one‑size‑fits‑all" static strategy that cannot accommodate the unique characteristics of each sample.
AdaGen: A Universal, Learnable, Sample‑Adaptive Generation‑Strategy Framework
The paper proposes AdaGen, a framework that learns an adaptive policy for each sample. By training a lightweight policy network with reinforcement learning (PPO), AdaGen automatically selects optimal generation parameters conditioned on the current generation state, while keeping the pretrained generator frozen.
Unified MDP Modeling Across Four Paradigms
AdaGen models the scheduling problem of all four major generation paradigms as a Markov Decision Process (MDP). The MDP defines:
State : the current generation step together with intermediate results (partial token sequences for MaskGIT and VAR, partially denoised images for diffusion and flow models).
Action : the set of strategy parameters required by each paradigm (e.g., mask ratio, sampling temperature, guidance scale for MaskGIT; ODE time step and guidance scale for diffusion/flow; temperature and guidance for autoregressive).
Transition : deterministic for diffusion and flow (ODE solver) and stochastic for mask‑based and autoregressive models.
Reward : evaluated only at the final step using a quality assessment function r(x).
The policy network is treated as an RL agent that observes the state, outputs actions, and is optimized with PPO to maximize the final‑step reward.
Adversarial Reward Modeling to Prevent Shortcutting
The authors explore three reward designs:
Using FID directly as reward—produces low FID (e.g., 2.56) but poor visual fidelity because the policy learns to game the metric.
Using a pretrained reward model—improves fidelity but leads to severe mode collapse and low diversity.
Adversarial reward (AdaGen’s approach)—introduces a discriminator that distinguishes real from generated images, forming a GAN‑like game that balances fidelity and diversity.
Empirical results show that the adversarial reward achieves a good trade‑off, avoiding the pitfalls of the other two designs.
Action Smoothing for Stable Exploration
When the number of generation steps increases (e.g., T from 8 to 32), the action space expands dramatically, causing instability in PPO training due to high‑frequency noise added to each step. The paper identifies that optimal policies for iterative generation are smooth over time. To enforce smoothness, AdaGen applies an exponential moving average (EMA) filter to the raw policy output a_t: a_t^{smooth}=\beta\,a_{t-1}^{smooth}+(1-\beta)\,a_t This operation acts as a low‑pass filter (suppressing high‑frequency fluctuations) while preserving causality, thus maintaining the Markov property of the MDP. Experiments demonstrate that action smoothing reduces FID from 3.5 to 2.3 and stabilizes training.
Training Loop
The training consists of two alternating steps:
Policy Network Optimization : generate images with the current policy, compute rewards, and update the policy via PPO.
Reward Model (Discriminator) Optimization : sample real and generated images, train the discriminator to better separate them. The two steps form an adversarial training loop similar to GANs.
Experimental Results
AdaGen is evaluated on ImageNet (256×256) across four generation paradigms and six models. Key findings:
Across all paradigms and inference step counts, AdaGen consistently outperforms the corresponding baselines.
Quality gains are more pronounced at fewer inference steps, with FID improvements ranging from 17% to 54%.
Efficiency gains of 1.6× to 3.6× in inference speed are achieved, while the policy network adds only 0.07%–0.40% extra compute.
Figures in the paper illustrate the quality‑efficiency frontier, showing that AdaGen pushes both dimensions forward for diffusion, autoregressive, mask‑based, and flow models.
Conclusion
AdaGen transforms generation‑strategy design from a handcrafted art into a data‑driven optimization problem. By unifying the scheduling problem as an MDP, employing adversarial reward modeling, and introducing action smoothing, AdaGen delivers substantial quality and speed improvements with minimal overhead, highlighting the importance of adaptive scheduling in modern image synthesis.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
