Baobao Algorithm Notes
Mar 24, 2022 · Artificial Intelligence
Exploring WuDaoMM: A 650M Chinese‑English Multimodal Dataset for Pre‑training
The article introduces WuDaoMM and WuDaoCorpora 2.0, massive Chinese‑English multimodal datasets—including 650 million image‑text pairs, 3 TB of text, 93 TB of images, and 181 GB of dialogue—detailing their composition, formats, access options, and potential research applications.
Chinese AIPre‑trainingWuDaoMM
0 likes · 6 min read
