Agent Architecture in Action: Building Next‑Gen Recommendation and Search Systems
This article reviews cutting‑edge AI search and recommendation technologies, covering Alibaba Cloud's Agentic RAG architecture, Huawei Noah's LLM‑enhanced recommendation pipeline, and Baidu's generative ranking model GRAB, while detailing their design challenges, multi‑modal retrieval strategies, performance gains, and real‑world deployment results.
The piece begins with a deep dive into Alibaba Cloud AI Search’s Agentic RAG solution, originally presented by AI Search lead Xing Shaomin. It outlines the technical challenges of high‑concurrency, multimodal data, and multi‑hop queries, then describes the evolution from a single‑agent to a multi‑agent system that coordinates planning, retrieval, and generation modules. The author explains the multi‑path retrieval chain that mixes vector, text, database, and graph recalls to boost coverage and accuracy, and discusses GPU‑accelerated indexing and query quantization, citing concrete performance comparisons. Extensions such as NL2SQL and multimodal search are also covered, with references to full architecture diagrams and benchmark data.
The second section examines Huawei Noah’s recommendation system evolution, tracing the shift from deep‑learning‑based models to large‑language‑model (LLM) and AI‑Agent paradigms. Using the KAR project as a case study, the article details how factorized prompting and a multi‑expert knowledge adapter map semantic knowledge into the recommendation embedding space. It highlights the design of the multi‑expert network that balances textual feature dimensionality with real‑time inference constraints, and reports an AUC lift of 1.5% together with online A/B‑test results. Further discussion includes LLM prompt engineering, fine‑tuning strategies for conversational recommendation, and future directions for cross‑platform recommendation ecosystems.
The final technical case study focuses on Baidu’s GRAB (Generative Ranking for Ads) model. The author explains how GRAB adopts the LLM “Scaling Law” and Transformer architecture to perform end‑to‑end generative sequence modeling of user behavior and ad targets, replacing traditional feature‑heavy pipelines. A novel Q‑Aware RAB causal attention mechanism is described, which injects query‑aware relative bias for adaptive modeling of complex interactions and temporal signals. The article also details the STS two‑stage training algorithm for efficiency and over‑fit mitigation, heterogeneous token representations, a dual‑loss stacking strategy, and KV‑Cache optimizations for high‑throughput online inference, concluding with quantified business benefits observed after full deployment.
Table of contents:
Multi‑agent interaction systems for AI‑for‑Good practices
LLM‑driven knowledge discovery and data‑science applications
Observability research in OpenAI Swarm at SF Tech
Huawei Noah: recommendation system evolution and LLM practice
GRAB: Baidu’s generative ranking model for ads
Vector search with Elasticsearch and RAG applications
Alibaba Cloud AI Search Agentic RAG practice
From big data to big models: frontier of search and recommendation
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
