Tagged articles

multimodal dataset

6 articles · Page 1 of 1

Jun 18, 2026 · Artificial Intelligence

Automating 3D Spatial Data: Holi‑Spatial’s 4M‑Scale Multimodal Dataset (ICML 2026 Oral)

Holi‑Spatial introduces a fully automatic pipeline that transforms raw video streams into high‑quality 3D geometry, depth, masks, 3D boxes, instance descriptions, grounding and spatial QA, producing the 4‑million‑item Holi‑Spatial‑4M dataset and substantially improving VLM spatial reasoning performance.

3D reconstructionICML 2026Large-Scale Data

0 likes · 14 min read

Automating 3D Spatial Data: Holi‑Spatial’s 4M‑Scale Multimodal Dataset (ICML 2026 Oral)

HyperAI Super Neural

Jun 11, 2026 · Artificial Intelligence

ChartNet: MIT/IBM’s Million‑Scale Synthetic Chart Dataset with 1.5M Diverse Samples

MIT and IBM researchers introduce ChartNet, the largest code‑guided synthetic chart dataset containing 1.5 million multimodal samples across 24 chart types and six libraries, and demonstrate that fine‑tuning visual‑language models on it yields consistent, significant gains on chart reconstruction, data extraction, summarization, and reasoning tasks, outperforming much larger off‑the‑shelf models including GPT‑4o.

AI researchChartNetchart understanding

0 likes · 13 min read

ChartNet: MIT/IBM’s Million‑Scale Synthetic Chart Dataset with 1.5M Diverse Samples

Machine Heart

Apr 8, 2026 · Artificial Intelligence

Beyond Simple Motions: How SentiAvatar Redefines 3D Digital Human Action Generation

SentiAvatar introduces a two‑stage plan‑then‑infill framework that separates sentence‑level semantic planning from frame‑level prosody‑driven motion infill, leveraging a 200K‑sequence Motion Foundation Model and the newly released 21k‑clip SuSuInterActs dataset to achieve state‑of‑the‑art, real‑time expressive 3D digital human animation.

3D digital humansMotion Foundation ModelSentiAvatar

0 likes · 13 min read

Beyond Simple Motions: How SentiAvatar Redefines 3D Digital Human Action Generation

Sohu Tech Products

Oct 29, 2025 · Information Security

Why a New Multimodal AI Security Dataset Is Essential for Detecting Deepfakes

As multimodal AI models become capable of generating realistic images, videos, and audio, the OpenMMSec benchmark provides a comprehensive, open‑source dataset and evaluation metrics that help researchers and developers detect and localize AI‑generated forgeries across all three modalities, addressing emerging security challenges.

AI securityEvaluation MetricsOpenMMSec

0 likes · 18 min read

Why a New Multimodal AI Security Dataset Is Essential for Detecting Deepfakes

Kuaishou Tech

Sep 25, 2023 · Artificial Intelligence

LPR4M: A Large-Scale Multimodal Livestreaming Product Recognition Dataset and the RICE Cross‑View Semantic Alignment Model

This paper introduces LPR4M, a 4‑million‑pair multimodal dataset for livestreaming product recognition, and proposes the RICE model that combines instance‑level contrastive learning with patch‑level cross‑view semantic alignment, demonstrating state‑of‑the‑art performance on both LPR4M and MovingFashion benchmarks.

Deep Learningcross-view alignmentlivestreaming

0 likes · 19 min read

LPR4M: A Large-Scale Multimodal Livestreaming Product Recognition Dataset and the RICE Cross‑View Semantic Alignment Model

Baobao Algorithm Notes

Mar 24, 2022 · Artificial Intelligence

Exploring WuDaoMM: A 650M Chinese‑English Multimodal Dataset for Pre‑training

The article introduces WuDaoMM and WuDaoCorpora 2.0, massive Chinese‑English multimodal datasets—including 650 million image‑text pairs, 3 TB of text, 93 TB of images, and 181 GB of dialogue—detailing their composition, formats, access options, and potential research applications.

Chinese AILarge-Scale DataPre‑training

0 likes · 6 min read

Exploring WuDaoMM: A 650M Chinese‑English Multimodal Dataset for Pre‑training