Can Uni‑X Eliminate Multimodal Gradient Conflict with a Pure Autoregressive Design?

The paper reveals that standard shared‑parameter Transformers suffer severe gradient conflict when jointly processing low‑entropy text and high‑entropy visual tokens, and proposes Uni‑X—a two‑end‑separated, middle‑shared autoregressive model that isolates modality‑specific layers, reduces conflict, improves efficiency, and achieves strong results on image generation and editing benchmarks.

Autoregressive ModelGradient ConflictICLR 2026

0 likes · 8 min read

Can Uni‑X Eliminate Multimodal Gradient Conflict with a Pure Autoregressive Design?

Uni-X

Can Uni‑X Eliminate Multimodal Gradient Conflict with a Pure Autoregressive Design?