Industry Insights 6 min read

How AI Search & Recommendation Systems Beat Multi-Modal, High-Concurrency Hurdles

This article reviews cutting‑edge technical practices from Alibaba Cloud AI Search, Huawei Noah's recommendation platform, and Baidu's GRAB model, detailing how multi‑agent RAG architectures, large‑language‑model enhancements, and generative ranking overcome high‑concurrency, multi‑modal data, and feature‑engineering bottlenecks.

DataFunSummit

Apr 21, 2026

How AI Search & Recommendation Systems Beat Multi-Modal, High-Concurrency Hurdles

Alibaba Cloud AI Search: Agentic RAG in Practice

The author, based on a talk by Alibaba Cloud AI Search lead Xing Shaomin, outlines the challenges of handling high concurrency, multimodal data, and complex multi‑hop queries. The solution is an Agentic Retrieval‑Augmented Generation (RAG) architecture that evolves from a single‑agent to a multi‑agent system. The workflow coordinates planning, retrieval, and generation modules to precisely interpret complex intents. A mixed‑recall strategy combines vector, text, database, and graph retrieval to boost coverage and accuracy. The article details GPU‑accelerated indexing and query stages, presenting quantitative gains (e.g., throughput improvements and latency reductions) and discusses extensions such as NL2SQL and multimodal search, with references to architecture diagrams and performance evaluation data.

Huawei Noah's Recommendation System: LLM‑Driven Evolution

This section reviews the transition from deep‑learning recommenders to large‑language‑model (LLM) and AI‑Agent‑enabled pipelines, using Huawei Noah's KAR project as a case study. The author identifies core problems: noisy implicit feedback, limited semantic understanding, and difficulty extracting user intent. By treating LLMs as feature enhancers and integrating them via factorized prompting and multi‑expert knowledge adapters, the system maps semantic knowledge into recommendation embeddings. Detailed analysis shows how the multi‑expert network balances text feature dimensionality with real‑time inference constraints. Experiments report a 1.5% AUC lift and online A/B test results, illustrating the practical impact of LLM integration.

Baidu GRAB: Generative Ranking for Ads

Baidu's commercial tech team designed GRAB (Generative Ranking for Ads) to overcome the performance ceiling of traditional Deep Learning Recommendation Models (DLRM). Inspired by LLM scaling laws and Transformer architecture, GRAB models user behavior and target ads as a unified sequence, replacing massive discrete features and manual engineering. The paper introduces the Q‑Aware RAB causal attention mechanism, which adds query‑aware relative bias for adaptive modeling of complex interactions and temporal signals. To address training efficiency and over‑fitting, a two‑stage STS training algorithm, heterogeneous token representations, and a dual‑loss stacking strategy are detailed. KV‑Cache is employed to ensure high‑concurrency online inference. The article provides quantified business gains after full deployment and compares GRAB with alternative ranking approaches.

Overall, the collection demonstrates how leading cloud and internet companies apply multi‑agent systems, LLM‑based feature augmentation, and generative models to solve high‑throughput, multimodal retrieval and recommendation challenges, offering concrete architectural diagrams, benchmark numbers, and deployment insights.

large language models Recommendation Systems AI search industry insights Multi-Modal Retrieval Generative Ranking

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.