Why Fixed CFG Fails and How Time‑Adaptive C²FG Boosts Diffusion Image Generation

This article introduces C²FG, a training‑free, plug‑and‑play time‑adaptive exponential control function that replaces the fixed classifier‑free guidance scale, theoretically justifies its superiority with score discrepancy bounds, and demonstrates significant FID and IS improvements across multiple diffusion architectures on ImageNet.

vivo Internet Technology
vivo Internet Technology
vivo Internet Technology
Why Fixed CFG Fails and How Time‑Adaptive C²FG Boosts Diffusion Image Generation

Introduction

Fixed classifier‑free guidance (CFG) traditionally uses a constant scale \(\omega\), assuming the conditional‑unconditional score difference is equally important at every diffusion timestep. Both theoretical analysis and empirical measurements show that this score discrepancy decays over time, so stronger and more precise guidance is required as sampling approaches the data distribution (\(t\to0\)).

Theoretical Insight

Using the VP‑SDE formulation, Theorem 1 establishes a strict exponential upper bound on the mean‑squared error of the score difference, which decreases as diffusion time increases. Consequently, the later stages of reverse sampling demand higher guidance intensity.

Theorem 1 illustration
Theorem 1 illustration

Method: C²FG

The constant \(\omega\) is replaced by a time‑dependent exponential control function: ω(t) = ω₀·e^{-λt} where \(ω₀\) denotes the maximum guidance intensity and \(λ\) controls the decay rate. The schedule is continuously differentiable, requires only these two hyper‑parameters, and can be inserted into any existing sampler without extra training or external classifiers.

Advantages

Matches theory: the exponential decay aligns with the proven score‑difference trend.

Smoother schedule: continuous differentiability yields more stable sampling than piecewise or linear schedules.

Minimal hyper‑parameters: only \(ω₀\) and \(λ\) need to be set.

Training‑free, plug‑and‑play: no additional model fine‑tuning is required.

Experiments

Extensive ImageNet conditional generation experiments were performed with diffusion backbones DiT‑XL/2 and SiT‑XL/2, using both ODE and SDE samplers at 256×256 and 512×512 resolutions.

Figure 1 confirms that the exponential decay of score discrepancy predicted by theory is observed in real models.

Score MSE and cosine similarity over time
Score MSE and cosine similarity over time

Figure 2 compares the sampling pipelines of standard CFG (constant \(\omega\)) and C²FG (time‑varying \(\omega(t)\)).

CFG vs C²FG sampling flow
CFG vs C²FG sampling flow

Figure 3 visualizes C²FG and shows that interval guidance can be interpreted as a special case that can be combined with C²FG for additional efficiency.

C²FG with interval guidance
C²FG with interval guidance

Figure 4 presents a 2‑D toy example where C²FG produces fewer outliers and better matches the target conditional distribution.

2D toy example
2D toy example

Figure 5 shows qualitative ImageNet results: sharper textures and fewer distortions across different samplers and step counts.

ImageNet visual comparison
ImageNet visual comparison

Quantitative Results

DiT‑XL/2 (256×256, ODE): baseline FID 2.29, IS 276.8 → C²FG FID 2.07, IS 291.5.

SiT‑XL/2 (REPA, 256×256, SDE): baseline FID 1.80, IS 284.0 → C²FG FID 1.51, IS 315.0.

Interval guidance baseline + C²FG: FID 1.41, IS 308.0.

DiT‑XL/2 (512×512, SDE, 100 steps): baseline FID 6.81, IS 229.5 → C²FG FID 6.54, IS 280.9.

Conclusion

C²FG offers a theoretically grounded, easy‑to‑implement alternative to fixed CFG, delivering consistent quality gains across diverse diffusion architectures without extra training. Its compatibility with interval guidance further reduces unnecessary model evaluations.

Reference: C²FG: Control Classifier‑Free Guidance via Score Discrepancy Analysis, CVPR 2026.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Image Generationdiffusion modelsClassifier-Free GuidanceCVPR 2026Plug-and-Playscore discrepancytime-adaptive control
vivo Internet Technology
Written by

vivo Internet Technology

Sharing practical vivo Internet technology insights and salon events, plus the latest industry news and hot conferences.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.