Artificial Intelligence 6 min read

Designing Next‑Gen Recommendation and Search with Agentic Architectures

The article analyzes cutting‑edge AI search and recommendation techniques—including Alibaba Cloud's Agentic RAG, Huawei Noah's LLM‑enhanced recommender, and Baidu's generative ranking model—detailing their architectures, multi‑modal retrieval strategies, performance gains, and practical deployment insights.

DataFunSummit

Jul 5, 2026

Designing Next‑Gen Recommendation and Search with Agentic Architectures

Based on a technical sharing by Xing Shaomin, the head of Alibaba Cloud AI Search, the article first outlines the challenges of high‑concurrency, multimodal data, and complex multi‑hop queries, and then presents the evolution of the Agentic RAG architecture from a single‑agent to a multi‑agent system that coordinates planning, retrieval, and generation modules to achieve precise intent understanding.

The multi‑path retrieval design mixes vector, textual, database, and graph recall strategies to boost query coverage and accuracy. It also discusses GPU‑accelerated indexing and querying, quantitative comparisons of quantization benefits, and extensions such as NL2SQL and multimodal search.

Next, the article reviews the technological evolution of recommendation systems from deep learning to large language models (LLM) and AI agents, focusing on core challenges like noisy implicit feedback, limited semantic understanding, and difficulty in mining user intent. Using Huawei Noah's KAR project as a case study, it describes how factorized prompting and a multi‑expert knowledge adapter map semantic knowledge into the recommendation embedding space, balancing feature dimensionality with real‑time latency, and reports an AUC lift of 1.5% together with online A/B test results.

The piece then examines Baidu's GRAB (Generative Ranking for Ads) model, which addresses the performance bottlenecks of traditional DLRM‑based ranking by adopting LLM scaling laws and a Transformer‑based end‑to‑end generative sequence model. It details the Q‑Aware RAB causal attention mechanism, a two‑stage STS training algorithm, heterogeneous token representations, dual‑loss stacking, and KV‑Cache optimizations for high‑concurrency inference, along with quantified business benefits after full deployment.

Finally, the ebook’s table of contents is listed, covering topics such as multi‑agent interaction for AI‑for‑good, knowledge discovery with LLM agents, observability of OpenAI Swarm, Elasticsearch vector search and RAG applications, and other frontier explorations from big data to large models in search and recommendation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Large Language Models Recommendation Systems Alibaba Cloud AI Search Baidu Agentic RAG Generative Ranking

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.