Artificial Intelligence 6 min read

Agent Architecture in Action: Building Next‑Gen Recommendation and Search Systems

This article reviews cutting‑edge AI search and recommendation technologies, covering Alibaba Cloud's Agentic RAG architecture, Huawei Noah's LLM‑enhanced recommendation pipeline, and Baidu's generative ranking model GRAB, while detailing their design challenges, multi‑modal retrieval strategies, performance gains, and real‑world deployment results.

DataFunSummit

May 8, 2026

Agent Architecture in Action: Building Next‑Gen Recommendation and Search Systems

The piece begins with a deep dive into Alibaba Cloud AI Search’s Agentic RAG solution, originally presented by AI Search lead Xing Shaomin. It outlines the technical challenges of high‑concurrency, multimodal data, and multi‑hop queries, then describes the evolution from a single‑agent to a multi‑agent system that coordinates planning, retrieval, and generation modules. The author explains the multi‑path retrieval chain that mixes vector, text, database, and graph recalls to boost coverage and accuracy, and discusses GPU‑accelerated indexing and query quantization, citing concrete performance comparisons. Extensions such as NL2SQL and multimodal search are also covered, with references to full architecture diagrams and benchmark data.

The second section examines Huawei Noah’s recommendation system evolution, tracing the shift from deep‑learning‑based models to large‑language‑model (LLM) and AI‑Agent paradigms. Using the KAR project as a case study, the article details how factorized prompting and a multi‑expert knowledge adapter map semantic knowledge into the recommendation embedding space. It highlights the design of the multi‑expert network that balances textual feature dimensionality with real‑time inference constraints, and reports an AUC lift of 1.5% together with online A/B‑test results. Further discussion includes LLM prompt engineering, fine‑tuning strategies for conversational recommendation, and future directions for cross‑platform recommendation ecosystems.

The final technical case study focuses on Baidu’s GRAB (Generative Ranking for Ads) model. The author explains how GRAB adopts the LLM “Scaling Law” and Transformer architecture to perform end‑to‑end generative sequence modeling of user behavior and ad targets, replacing traditional feature‑heavy pipelines. A novel Q‑Aware RAB causal attention mechanism is described, which injects query‑aware relative bias for adaptive modeling of complex interactions and temporal signals. The article also details the STS two‑stage training algorithm for efficiency and over‑fit mitigation, heterogeneous token representations, a dual‑loss stacking strategy, and KV‑Cache optimizations for high‑throughput online inference, concluding with quantified business benefits observed after full deployment.

Table of contents:

Multi‑agent interaction systems for AI‑for‑Good practices

LLM‑driven knowledge discovery and data‑science applications

Observability research in OpenAI Swarm at SF Tech

Huawei Noah: recommendation system evolution and LLM practice

GRAB: Baidu’s generative ranking model for ads

Vector search with Elasticsearch and RAG applications

Alibaba Cloud AI Search Agentic RAG practice

From big data to big models: frontier of search and recommendation

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

recommendation systems Large Language Model AI search Multi-Agent Architecture Agentic RAG generative ranking

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.