Why Fixed CFG Fails and How Time‑Adaptive C²FG Boosts Diffusion Image Generation
This article introduces C²FG, a training‑free, plug‑and‑play time‑adaptive exponential control function that replaces the fixed classifier‑free guidance scale, theoretically justifies its superiority with score discrepancy bounds, and demonstrates significant FID and IS improvements across multiple diffusion architectures on ImageNet.
Introduction
Fixed classifier‑free guidance (CFG) traditionally uses a constant scale \(\omega\), assuming the conditional‑unconditional score difference is equally important at every diffusion timestep. Both theoretical analysis and empirical measurements show that this score discrepancy decays over time, so stronger and more precise guidance is required as sampling approaches the data distribution (\(t\to0\)).
Theoretical Insight
Using the VP‑SDE formulation, Theorem 1 establishes a strict exponential upper bound on the mean‑squared error of the score difference, which decreases as diffusion time increases. Consequently, the later stages of reverse sampling demand higher guidance intensity.
Method: C²FG
The constant \(\omega\) is replaced by a time‑dependent exponential control function: ω(t) = ω₀·e^{-λt} where \(ω₀\) denotes the maximum guidance intensity and \(λ\) controls the decay rate. The schedule is continuously differentiable, requires only these two hyper‑parameters, and can be inserted into any existing sampler without extra training or external classifiers.
Advantages
Matches theory: the exponential decay aligns with the proven score‑difference trend.
Smoother schedule: continuous differentiability yields more stable sampling than piecewise or linear schedules.
Minimal hyper‑parameters: only \(ω₀\) and \(λ\) need to be set.
Training‑free, plug‑and‑play: no additional model fine‑tuning is required.
Experiments
Extensive ImageNet conditional generation experiments were performed with diffusion backbones DiT‑XL/2 and SiT‑XL/2, using both ODE and SDE samplers at 256×256 and 512×512 resolutions.
Figure 1 confirms that the exponential decay of score discrepancy predicted by theory is observed in real models.
Figure 2 compares the sampling pipelines of standard CFG (constant \(\omega\)) and C²FG (time‑varying \(\omega(t)\)).
Figure 3 visualizes C²FG and shows that interval guidance can be interpreted as a special case that can be combined with C²FG for additional efficiency.
Figure 4 presents a 2‑D toy example where C²FG produces fewer outliers and better matches the target conditional distribution.
Figure 5 shows qualitative ImageNet results: sharper textures and fewer distortions across different samplers and step counts.
Quantitative Results
DiT‑XL/2 (256×256, ODE): baseline FID 2.29, IS 276.8 → C²FG FID 2.07, IS 291.5.
SiT‑XL/2 (REPA, 256×256, SDE): baseline FID 1.80, IS 284.0 → C²FG FID 1.51, IS 315.0.
Interval guidance baseline + C²FG: FID 1.41, IS 308.0.
DiT‑XL/2 (512×512, SDE, 100 steps): baseline FID 6.81, IS 229.5 → C²FG FID 6.54, IS 280.9.
Conclusion
C²FG offers a theoretically grounded, easy‑to‑implement alternative to fixed CFG, delivering consistent quality gains across diverse diffusion architectures without extra training. Its compatibility with interval guidance further reduces unnecessary model evaluations.
Reference: C²FG: Control Classifier‑Free Guidance via Score Discrepancy Analysis, CVPR 2026.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
vivo Internet Technology
Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
