Sep 24, 2022 · Artificial Intelligence

Cross‑Modal Image‑Text Representation: The Zero Dataset and R2D2 Pre‑training Framework

This article introduces the importance of image‑text cross‑modal representation, presents the Chinese Zero dataset with two pre‑training subsets and five downstream tasks, describes the R2D2 dual‑tower‑plus‑single‑tower pre‑training framework with multiple loss functions, and reports extensive experiments and real‑world deployment insights.

Multimodal AIR2D2 frameworkZero dataset

0 likes · 19 min read

Cross‑Modal Image‑Text Representation: The Zero Dataset and R2D2 Pre‑training Framework