Tagged articles

cross-modal retrieval

5 articles · Page 1 of 1

Oct 29, 2025 · Artificial Intelligence

How Amazon Nova’s Multimodal Embedding Model Handles All Modalities in One Go

Amazon Nova, a new multimodal embedding model now available on Amazon Bedrock, unifies text, document, image, video, and audio into a single semantic space, offering up to 8000‑token context, multiple output dimensions, and detailed Python examples for embedding generation, storage, and cross‑modal search.

AWS BedrockAmazon NovaPython SDK

0 likes · 19 min read

How Amazon Nova’s Multimodal Embedding Model Handles All Modalities in One Go

AntTech

Aug 16, 2024 · Artificial Intelligence

PC²: Pseudo‑Classification Based Pseudo‑Captioning for Noisy Correspondence Learning in Cross‑Modal Retrieval

The paper introduces PC², a novel framework that combines pseudo‑classification and pseudo‑captioning to mitigate noisy correspondence in cross‑modal retrieval, presents a large‑scale web‑page/image‑meta‑description dataset called Noise of Web (NoW), and demonstrates significant performance gains on multiple benchmark datasets including Flickr30K, MS‑COCO, and the newly released NoW.

Multimodal LearningPC2cross-modal retrieval

0 likes · 16 min read

PC²: Pseudo‑Classification Based Pseudo‑Captioning for Noisy Correspondence Learning in Cross‑Modal Retrieval

Meituan Technology Team

May 16, 2024 · Artificial Intelligence

CMIngre: A Cross‑Modal Ingredient‑Level Dataset for Chinese Food Understanding

The CMIngre dataset, created by Meituan’s R&D platform and Tianjin University, offers 8,001 image‑text pairs of 429 Chinese dishes with 95,290 ingredient bounding boxes, enabling fine‑grained ingredient detection and cross‑modal retrieval tasks, and baseline experiments show DINO and CLIP models achieve the strongest performance.

computer visioncross-modal retrievalfood understanding

0 likes · 44 min read

CMIngre: A Cross‑Modal Ingredient‑Level Dataset for Chinese Food Understanding

Tencent Cloud Developer

Nov 11, 2022 · Artificial Intelligence

Tencent Advertising Multimedia AI Technology: Research and Application

Liu Wei outlines Tencent’s Advertising Multimedia AI ecosystem on the Taiji platform, describing a five‑platform matrix—Jue for content understanding, Qiankun for automated video creation, Shenzhen for AI‑driven review, Tianyin for hierarchical fingerprinting, and Hunyuan as a multimodal large model—featuring innovations such as massive multimodal pre‑training, logo retrieval, QA‑style attribute extraction, spatiotemporal video analysis, advanced auto‑judgment, and high‑performance hashing that achieve top cross‑modal retrieval results.

advertising technologycomputer visioncontent understanding

0 likes · 18 min read

Tencent Advertising Multimedia AI Technology: Research and Application

Alibaba Cloud Developer

Jul 19, 2018 · Artificial Intelligence

Can Generative Models Boost Visual‑Text Retrieval? Introducing GXN

This paper presents GXN, a generative cross‑modal feature learning framework that enhances image‑text retrieval by incorporating both high‑level semantic similarity and fine‑grained local matching through a three‑step Look‑Imagine‑Match process, achieving state‑of‑the‑art results on MSCOCO and Flickr30K.

Artificial IntelligenceDeep LearningGenerative Models

0 likes · 6 min read

Can Generative Models Boost Visual‑Text Retrieval? Introducing GXN