Tagged articles

multimodal retrieval

28 articles · Page 1 of 1

Jun 12, 2026 · Artificial Intelligence

How Agentic Architectures Power Next‑Gen Recommendation and Search Systems

This article analyzes cutting‑edge AI search and recommendation technologies, covering Alibaba Cloud's Agentic RAG architecture, Huawei Noah's LLM‑enhanced recommender evolution, and Baidu's generative ranking model GRAB, each with detailed designs, performance metrics, and real‑world deployment insights.

AI agentsGenerative RankingRecommendation Systems

0 likes · 6 min read

How Agentic Architectures Power Next‑Gen Recommendation and Search Systems

DataFunTalk

Jun 4, 2026 · Artificial Intelligence

Bridging the Speech Modality Gap with Domain Knowledge Enhancement

The article analyzes recent end‑to‑end speech models, compares four knowledge‑enhancement architectures, evaluates their technical mechanisms, pros and cons, and outlines how these approaches can be applied to the insurance and finance sectors to build real‑time, domain‑aware voice agents.

S2S architecturedomain fine‑tuninginsurance AI

0 likes · 12 min read

Bridging the Speech Modality Gap with Domain Knowledge Enhancement

DataFunTalk

May 15, 2026 · Artificial Intelligence

Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models

This article provides a comprehensive technical overview of multimodal GraphRAG, detailing document‑intelligence parsing pipelines, layout analysis, OCR‑pipeline vs OCR‑free approaches, knowledge‑graph integration for chunk relationships, multimodal indexing, retrieval‑generation workflows, and a comparative analysis of RAG, GraphRAG, and KG‑QA solutions.

GraphRAGKnowledge GraphLayout Analysis

0 likes · 23 min read

Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models

DataFunTalk

May 10, 2026 · Artificial Intelligence

Exploring Multimodal GraphRAG: Combining Document Intelligence, Knowledge Graphs, and Large Models

This article presents a detailed technical walkthrough of multimodal GraphRAG, covering document‑intelligence parsing pipelines, multimodal graph index construction, knowledge‑graph‑driven chunk linking, recent research progress, performance trade‑offs, and practical recommendations for deploying RAG solutions.

GraphRAGKnowledge GraphOCR

0 likes · 23 min read

DataFunTalk

May 5, 2026 · Artificial Intelligence

Agent Architecture in Action: Building Next‑Gen Recommendation and Search Systems

This article reviews cutting‑edge AI search and recommendation techniques—including Alibaba Cloud's Agentic RAG, Huawei Noah's LLM‑enhanced recommendation pipeline, and Baidu's generative ranking model GRAB—detailing their architectural evolution, multimodal retrieval strategies, GPU acceleration, and measured performance gains.

AI SearchAgentic RAGGPU Acceleration

0 likes · 6 min read

Agent Architecture in Action: Building Next‑Gen Recommendation and Search Systems

Alibaba Cloud Big Data AI Platform

May 1, 2026 · Artificial Intelligence

Zero Deployment, Zero Ops: Alibaba Cloud Milvus Embedding Service Makes Vectorization Plug‑and‑Play

The article explains how Alibaba Cloud's Milvus Embedding Service eliminates the need for self‑hosted embedding models by integrating model inference, vector generation and Milvus indexing into a managed pipeline, dramatically reducing deployment complexity, operational overhead, and time‑to‑value for semantic search, RAG and multimodal retrieval use cases.

Alibaba CloudEmbeddingMilvus

0 likes · 19 min read

Zero Deployment, Zero Ops: Alibaba Cloud Milvus Embedding Service Makes Vectorization Plug‑and‑Play

DataFunTalk

Apr 24, 2026 · Artificial Intelligence

Exploring Multimodal GraphRAG: Document Intelligence, Knowledge Graphs, and Large‑Model Integration

This article presents a detailed technical walkthrough of multimodal GraphRAG, covering document‑intelligence parsing pipelines, layout‑analysis models, knowledge‑graph augmentation, multimodal indexing and retrieval, and a comparative analysis of RAG, GraphRAG, and KG‑QA approaches, with concrete examples, model sizes, benchmark scores, and research citations.

GraphRAGKnowledge GraphLayout Analysis

0 likes · 25 min read

Exploring Multimodal GraphRAG: Document Intelligence, Knowledge Graphs, and Large‑Model Integration

AI Waka

Mar 26, 2026 · Artificial Intelligence

Building Production‑Ready AI Agents with NVIDIA Nemotron: A Full‑Stack Guide

This guide explains how to assemble NVIDIA's Nemotron Speech, RAG, and Safety models into a low‑latency, secure production AI agent stack, covering performance benchmarks, multimodal retrieval, safety data sets, integration code, and deployment options for cloud, on‑premise, and edge environments.

Content SafetyNVIDIAProduction Deployment

0 likes · 9 min read

Building Production‑Ready AI Agents with NVIDIA Nemotron: A Full‑Stack Guide

Machine Learning Algorithms & Natural Language Processing

Mar 17, 2026 · Artificial Intelligence

DeepImageSearch Ushers in the Deep Search Era: Enabling AI to Understand Visual Histories

DeepImageSearch introduces a new paradigm that shifts image retrieval from isolated semantic matching to corpus‑level contextual reasoning, supported by the DISBench benchmark and the ImageSeeker framework, revealing that even state‑of‑the‑art multimodal models struggle with multi‑step visual‑history queries.

DISBenchDeepImageSearchImageSeeker

0 likes · 15 min read

DeepImageSearch Ushers in the Deep Search Era: Enabling AI to Understand Visual Histories

DataFunSummit

Dec 19, 2025 · Artificial Intelligence

How Agentic RAG, LLM‑Powered Recommendations, and Generative Ranking Transform AI Search and Ads

This article surveys cutting‑edge AI techniques—including Alibaba Cloud's Agentic RAG for multimodal search, Huawei Noah's LLM‑enhanced recommendation evolution, and Baidu's generative ranking (GRAB) for ads—detailing their architectures, optimization tricks, performance gains, and real‑world deployment results.

AI SearchAgentic RAGGPU Acceleration

0 likes · 9 min read

How Agentic RAG, LLM‑Powered Recommendations, and Generative Ranking Transform AI Search and Ads

PaperAgent

Dec 12, 2025 · Artificial Intelligence

How BookRAG Redefines Long-Document Retrieval with Hierarchical Indexing

BookRAG introduces a hierarchical, structure‑aware indexing method that combines tree‑based document representation with graph‑based entity linking and an agent‑driven retrieval pipeline, achieving up to 71.2% recall improvement on multimodal long‑document benchmarks while cutting token usage and latency dramatically.

Agent RetrievalLLMLong Document QA

0 likes · 7 min read

How BookRAG Redefines Long-Document Retrieval with Hierarchical Indexing

Tencent Advertising Technology

Nov 28, 2025 · Artificial Intelligence

How Retrv-R1 Redefines Universal Multimodal Retrieval with Reasoning‑Driven MLLM

Retrv‑R1, a reasoning‑driven multimodal large language model framework, tackles the precision‑efficiency dilemma of universal multimodal retrieval by introducing a two‑stage coarse‑to‑fine pipeline, an information‑compression module, a detail‑inspection mechanism, and a three‑stage training strategy, achieving SOTA performance across accuracy, efficiency, and generalization benchmarks.

EfficiencyMLLMdetail inspection

0 likes · 21 min read

How Retrv-R1 Redefines Universal Multimodal Retrieval with Reasoning‑Driven MLLM

AI Large Model Application Practice

Sep 2, 2025 · Artificial Intelligence

Building a Multimodal Hybrid Retrieval Agent on an Integrated AI Data Layer

This article explores why many enterprise AI projects fail to deliver value, analyzes the complexity of real‑world AI use cases, and presents a step‑by‑step demo that combines vector, keyword, numeric, and spatial queries using OceanBase as a unified multimodal data store.

Enterprise AIHybrid SearchLLM

0 likes · 15 min read

Building a Multimodal Hybrid Retrieval Agent on an Integrated AI Data Layer

Alibaba Cloud Native

Aug 25, 2025 · Artificial Intelligence

How 1688 AI App Redefines B2B E‑commerce with AI‑Powered Search and Multimodal Interfaces

The article examines the design shift from the traditional 1688 App to the AI‑native 1688 AI App, detailing how AI‑driven interfaces, system prompts, embedding‑based retrieval, multi‑agent routing, and AI gateways transform B2B product discovery, recommendation, and customization.

AI SearchB2B e-commerceLarge Language Model

0 likes · 20 min read

How 1688 AI App Redefines B2B E‑commerce with AI‑Powered Search and Multimodal Interfaces

Alibaba Cloud Big Data AI Platform

Jul 8, 2025 · Artificial Intelligence

How Video Retrieval‑Augmented Generation Transforms Multimodal AI Search

This article explains the end‑to‑end implementation of Video RAG in OpenSearch LLM, covering offline parsing, key‑frame extraction, audio transcription, slice creation, multimodal vectorization, hybrid indexing, and online query processing while addressing challenges like recall performance and long‑video efficiency.

ASRKey Frame ExtractionLLM

0 likes · 10 min read

How Video Retrieval‑Augmented Generation Transforms Multimodal AI Search

Alibaba Cloud Developer

Jun 19, 2025 · Artificial Intelligence

Build Efficient Multimodal Text‑Image Search with Alibaba Cloud Milvus

This guide explains how to use Alibaba Cloud Milvus to create a scalable, high‑performance multimodal search system that supports text‑to‑image, image‑to‑image, and cross‑modal queries across various business scenarios, detailing architecture, deployment steps, validation, and resource cleanup.

AIMilvusServerless

0 likes · 8 min read

Build Efficient Multimodal Text‑Image Search with Alibaba Cloud Milvus

Big Data Technology & Architecture

Jun 9, 2025 · Databases

Why Data Warebase Could Be the Next Game‑Changer for AI Workloads

The article examines how emerging data‑infrastructure trends, multi‑modal databases like Neon, Supabase, and ClickHouse, and the convergence of OLTP, OLAP, and vector search are reshaping AI workloads, introducing the Data Warebase concept that unifies warehouse and database capabilities to meet modern AI workflow demands.

AIData InfrastructureDatabases

0 likes · 32 min read

Why Data Warebase Could Be the Next Game‑Changer for AI Workloads

Meituan Technology Team

Oct 31, 2024 · Artificial Intelligence

Selected Meituan Papers from CIKM 2024: Summaries of Eight Research Works

This article highlights eight Meituan research papers accepted at CIKM 2024—spanning self‑supervised sequential recommendation, rating‑consistent explanation generation, CTR prediction via recommendation pre‑training, cross‑domain interest transfer, multimodal vector retrieval, design‑aware poster layout, order‑fulfillment cycle‑time forecasting, and delivery‑scope substitution—offering insights from both internal and university collaborations.

AI researchCTR PredictionCross‑Domain Recommendation

0 likes · 16 min read

Selected Meituan Papers from CIKM 2024: Summaries of Eight Research Works

Huolala Tech

Aug 22, 2024 · Artificial Intelligence

How Large Language Models Automate Order Cancellation Responsibility at HuoLala

This article explains how HuoLala leverages large language models, multimodal feature integration, and retrieval‑augmented generation to automatically determine responsibility for order cancellations, improving accuracy, explainability, and driver‑user experience.

AIOrder CancellationRAG

0 likes · 10 min read

How Large Language Models Automate Order Cancellation Responsibility at HuoLala

Meituan Technology Team

Jul 4, 2024 · Artificial Intelligence

Meituan Search Advertising: Evolution of Recall Strategies and Generative Approaches

Meituan’s search advertising has progressed from rule‑based keyword mining to hierarchical recall that partitions traffic and supply, and now to generative recall using large language models, chain‑of‑thought generation, diffusion‑enhanced multimodal vectors, and knowledge distillation, expanding the decision space while tackling compute and ROI challenges.

MeituanRecall Strategiesgenerative models

0 likes · 19 min read

Meituan Search Advertising: Evolution of Recall Strategies and Generative Approaches

Rare Earth Juejin Tech Community

Apr 8, 2024 · Artificial Intelligence

PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers

The article introduces PreFLMR, an open‑source, general‑purpose pre‑trained multimodal retriever that leverages fine‑grained late‑interaction to boost retrieval‑augmented generation for knowledge‑intensive visual tasks, describes its M2KR benchmark, training stages, and strong experimental results across multiple tasks.

AIFLMRPretrained Models

0 likes · 11 min read

PreFLMR: Scaling Up Fine-Grained Late-Interaction Multi-modal Retrievers

Alibaba Cloud Big Data AI Platform

Jul 12, 2023 · Artificial Intelligence

How ConaCLIP Boosts Lightweight Text-Image Retrieval with Dual‑Encoder Distillation

ConaCLIP introduces a fully‑connected knowledge interaction graph to distill large dual‑encoder models into compact ones, enhancing text‑image retrieval accuracy and efficiency on edge devices, with extensive experiments and supervision strategies demonstrating significant gains over existing baselines.

AIConaCLIPDual Encoder

0 likes · 9 min read

How ConaCLIP Boosts Lightweight Text-Image Retrieval with Dual‑Encoder Distillation

Alibaba Cloud Big Data AI Platform

Jul 11, 2023 · Artificial Intelligence

How FashionKLIP Boosts E‑Commerce Image‑Text Retrieval with a Multimodal Knowledge Graph

The ACL 2023 paper introduces FashionKLIP, an e‑commerce visual‑language model enhanced by a multimodal concept knowledge graph, detailing its automated knowledge graph construction, dual‑stream training strategy, and superior performance on FashionGen retrieval benchmarks compared to state‑of‑the‑art methods.

FashionKLIPKnowledge Graphe-commerce

0 likes · 10 min read

How FashionKLIP Boosts E‑Commerce Image‑Text Retrieval with a Multimodal Knowledge Graph

Architect

May 18, 2021 · Big Data

Design and Optimization of Baidu's Image Processing and Ingestion Platform (Imazon) for Multimodal Retrieval

This article details Baidu's multimodal retrieval architecture, explaining the separation of online and offline services, the design of the Imazon image processing and ingestion platform, its technical indicators, large‑scale streaming and batch pipelines, optimization practices for high throughput, and the underlying content‑relationship engine.

DAGImage processingSystem Optimization

0 likes · 13 min read

Design and Optimization of Baidu's Image Processing and Ingestion Platform (Imazon) for Multimodal Retrieval

High Availability Architecture

May 18, 2021 · Big Data

Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)

This article details Baidu's large‑scale image processing and multimodal retrieval system, describing its offline‑online architecture, massive data ingestion pipeline, ANN search techniques, performance metrics, infrastructure components, and a series of optimizations for throughput, cost, and reliability in a high‑volume streaming environment.

BaiduImage processingImazon

0 likes · 12 min read

Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)

Baidu Geek Talk

May 17, 2021 · Artificial Intelligence

Design and Optimization of Baidu's Image Processing and Multimodal Retrieval Platform (Imazon)

The Imazon platform unifies Baidu’s image acquisition, feature extraction, and ANN‑based multimodal retrieval into a cloud‑native, real‑time pipeline that ingests billions of images daily, optimizes storage and GPU usage, reduces message‑queue costs, and ensures high‑throughput, low‑latency search across text, visual, and voice queries.

Cloud NativeDAGImage processing

0 likes · 13 min read

Meituan Technology Team

Sep 24, 2020 · Artificial Intelligence

Multimodal Recall Solution for KDD Cup 2020: ImageBERT and LXMERT Based Approach

The second‑place team tackled KDD Cup 2020’s Multimodal Recall challenge by fine‑tuning ImageBERT and LXMERT on query‑image pairs, generating negatives, applying AMSoftmax and multi‑similarity losses, ensembling weighted predictions, and using score‑based post‑processing, boosting NDCG@5 to 0.8352 and powering Meituan’s multimodal search pipeline.

ImageBERTKDD Cup 2020LXMERT

0 likes · 23 min read

Multimodal Recall Solution for KDD Cup 2020: ImageBERT and LXMERT Based Approach

iQIYI Technical Product Team

Jul 12, 2019 · Artificial Intelligence

Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble

The ‘One Name’ team from Nanjing University achieved a MAP of 0.8986 and third place in the iQIYI multimodal video retrieval challenge by fusing official face embeddings with scene features, using channel‑attention‑based video feature fusion, a multimodal SE‑ResNeXt module, and a carefully partitioned model ensemble.

feature fusioniQIYI challengemodel ensemble

0 likes · 7 min read

Multimodal Video Retrieval Solution for iQIYI Challenge: Feature Fusion and Model Ensemble