Tagged articles
4 articles
Page 1 of 1
AntTech
AntTech
Aug 16, 2024 · Artificial Intelligence

PC²: Pseudo‑Classification Based Pseudo‑Captioning for Noisy Correspondence Learning in Cross‑Modal Retrieval

The paper introduces PC², a novel framework that combines pseudo‑classification and pseudo‑captioning to mitigate noisy correspondence in cross‑modal retrieval, presents a large‑scale web‑page/image‑meta‑description dataset called Noise of Web (NoW), and demonstrates significant performance gains on multiple benchmark datasets including Flickr30K, MS‑COCO, and the newly released NoW.

Multimodal LearningPC2cross-modal retrieval
0 likes · 16 min read
PC²: Pseudo‑Classification Based Pseudo‑Captioning for Noisy Correspondence Learning in Cross‑Modal Retrieval
Meituan Technology Team
Meituan Technology Team
May 16, 2024 · Artificial Intelligence

CMIngre: A Cross‑Modal Ingredient‑Level Dataset for Chinese Food Understanding

The CMIngre dataset, created by Meituan’s R&D platform and Tianjin University, offers 8,001 image‑text pairs of 429 Chinese dishes with 95,290 ingredient bounding boxes, enabling fine‑grained ingredient detection and cross‑modal retrieval tasks, and baseline experiments show DINO and CLIP models achieve the strongest performance.

Computer Visioncross-modal retrievalfood understanding
0 likes · 44 min read
CMIngre: A Cross‑Modal Ingredient‑Level Dataset for Chinese Food Understanding
Tencent Cloud Developer
Tencent Cloud Developer
Nov 11, 2022 · Artificial Intelligence

Tencent Advertising Multimedia AI Technology: Research and Application

Liu Wei outlines Tencent’s Advertising Multimedia AI ecosystem on the Taiji platform, describing a five‑platform matrix—Jue for content understanding, Qiankun for automated video creation, Shenzhen for AI‑driven review, Tianyin for hierarchical fingerprinting, and Hunyuan as a multimodal large model—featuring innovations such as massive multimodal pre‑training, logo retrieval, QA‑style attribute extraction, spatiotemporal video analysis, advanced auto‑judgment, and high‑performance hashing that achieve top cross‑modal retrieval results.

Computer VisionMultimodal AIadvertising technology
0 likes · 18 min read
Tencent Advertising Multimedia AI Technology: Research and Application
Alibaba Cloud Developer
Alibaba Cloud Developer
Jul 19, 2018 · Artificial Intelligence

Can Generative Models Boost Visual‑Text Retrieval? Introducing GXN

This paper presents GXN, a generative cross‑modal feature learning framework that enhances image‑text retrieval by incorporating both high‑level semantic similarity and fine‑grained local matching through a three‑step Look‑Imagine‑Match process, achieving state‑of‑the‑art results on MSCOCO and Flickr30K.

Deep LearningGenerative Modelsartificial intelligence
0 likes · 6 min read
Can Generative Models Boost Visual‑Text Retrieval? Introducing GXN