DataFunTalk
Mar 20, 2020 · Artificial Intelligence
UNITER: Unified Image‑Text Representation Learning for Vision‑Language Tasks
This article introduces UNITER, a unified image‑text representation learning framework pretrained on four large multimodal datasets, describes its three pretraining tasks (MLM, ITM, MRM), details model architecture, training optimizations, and evaluates performance across six vision‑language downstream tasks, achieving state‑of‑the‑art results.
ITMMLMMRM
0 likes · 11 min read