Unlocking Unusual Concept Combinations in Generative AI with IMBA Loss

The paper identifies imbalanced concept distributions as the main obstacle to arbitrary concept‑combination in text‑to‑image/video generation, proposes the token‑level IMBA Distance and a lightweight IMBA Loss that adaptively re‑weights training tokens, and demonstrates through extensive experiments and a new Inert‑CompBench benchmark that this loss dramatically improves compositional ability without extra data.

Kuaishou Tech
Kuaishou Tech
Kuaishou Tech
Unlocking Unusual Concept Combinations in Generative AI with IMBA Loss

Motivation

State‑of‑the‑art diffusion models (e.g., Stable Diffusion 3, DALL·E 3) often fail to generate images that correctly combine arbitrary concepts, especially when the desired relation is rare or counter‑intuitive. Typical failure modes include missing concepts, attribute leakage, and inconsistent image‑text pairs.

Factors Influencing Concept Combination

Large‑scale experiments on a 31 M high‑quality text‑image dataset show that, once model capacity and data volume reach a sufficient scale, the imbalance of concept frequencies in the training set becomes the dominant bottleneck. Controlling for model size and total data, models trained on a balanced concept distribution consistently achieve higher compositional performance.

Adaptive Concept‑Balancing Pre‑training Loss (IMBA Loss)

The authors introduce IMBA Distance , defined as the L_γ norm of the difference between the ground‑truth noise vector ε_gt and the model’s unconditional predicted noise ε_pred at the token level:

IMBA_Distance(t) = \| ε_gt(t) - ε_pred(t) \|_γ

Because the distance is larger for under‑represented (tail) concepts, it serves as a precise, token‑wise measure of concept imbalance. During training the distance is used as a dynamic weight for each token, yielding the IMBA Loss :

loss = diffusion_loss * (1 + λ * IMBA_Distance)

Only a few lines of code are required to compute the per‑token weight and add it to the standard diffusion objective. The resulting loss improves the model’s ability to generate novel concept combinations in both pre‑training and fine‑tuning stages and generalises to video diffusion models.

Inert‑CompBench: Benchmark for Tail (Lazy) Concepts

Statistical analysis of failure cases reveals that low‑frequency (tail) concepts cause the majority of compositional errors; the authors refer to these as “lazy concepts.” Using a controlled construction procedure (Algorithm 2), they build Inert‑CompBench , a benchmark that evaluates model performance specifically on such tail concepts, complementing existing compositionality benchmarks.

Key Experimental Findings

When model size and data scale are fixed, a balanced concept distribution yields a ~15 % boost in compositional accuracy compared to the original long‑tail distribution.

IMBA Distance is empirically larger for tail concepts across both synthetic and real text‑to‑image evaluations.

Integrating IMBA Loss into the diffusion objective improves zero‑shot compositional generation without any additional data; the improvement persists after fine‑tuning on downstream tasks.

The benchmark Inert‑CompBench exposes a systematic drop in performance on lazy concepts for baseline models, while IMBA‑trained models close the gap.

Conclusion

The study demonstrates that concept‑distribution imbalance is the primary obstacle to arbitrary composition in generative models. By introducing a lightweight, adaptive IMBA Loss that re‑weights token‑level training signals according to IMBA Distance, the authors achieve substantial compositional gains without extra data. The newly proposed Inert‑CompBench provides a focused evaluation suite for future work on rare‑concept composition.

Illustrative Figures

Concept‑combination failure example (fork vs. tweezers)
Concept‑combination failure example (fork vs. tweezers)
IMBA Loss improves concept‑combination with 31 M training samples
IMBA Loss improves concept‑combination with 31 M training samples
Controlled experiment showing data distribution as the main factor
Controlled experiment showing data distribution as the main factor
IMBA Distance larger for tail concepts
IMBA Distance larger for tail concepts
Algorithm 1: Integration of IMBA Loss
Algorithm 1: Integration of IMBA Loss
Visualization of IMBA Loss impact
Visualization of IMBA Loss impact
Inert‑CompBench construction process
Inert‑CompBench construction process
benchmarkDiffusion ModelsGenerative AIImbalanced Dataconcept combinationIMBA Loss
Kuaishou Tech
Written by

Kuaishou Tech

Official Kuaishou tech account, providing real-time updates on the latest Kuaishou technology practices.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.