Agent Architecture in Action: Building Next‑Gen Recommendation and Search Systems

The article reviews cutting‑edge technical practices for next‑generation recommendation and search, covering Alibaba Cloud AI Search's Agentic RAG multi‑agent design, Huawei Noah's LLM‑enhanced recommendation evolution, Baidu's generative ranking (GRAB) for ads, and Elasticsearch‑based vector RAG implementations, with concrete architecture details and performance results.

DataFunSummit
DataFunSummit
DataFunSummit
Agent Architecture in Action: Building Next‑Gen Recommendation and Search Systems

Agentic RAG in Alibaba Cloud AI Search

The piece summarizes a technical share by Xing Shaomin of Alibaba Cloud AI Search, outlining challenges such as high concurrency, multimodal data, and multi‑hop queries. It describes the evolution from a single Agent to a multi‑Agent system that coordinates planning, retrieval, and generation modules to precisely understand complex intents. The multi‑path retrieval chain mixes vector, text, database, and graph recall to boost coverage and accuracy. Optimization details include GPU‑accelerated indexing and query quantization, with reported speed‑up figures, and extensions like NL2SQL and multimodal search. Full architecture diagrams and performance evaluation data are referenced.

Huawei Noah's Recommendation System Evolution

This section reviews the transition from deep‑learning recommenders to large‑language‑model (LLM) and AI‑Agent eras. It identifies core challenges—noisy implicit feedback, weak semantic understanding, and difficulty mining user intent. Using the KAR project from Huawei Noah’s Ark Lab as a case study, the article explains how factorized prompting and a multi‑expert knowledge adapter map semantic knowledge into the recommendation embedding space. The multi‑expert network balances text feature dimensionality with real‑time constraints. Experiments show a 1.5% AUC lift and online A/B‑test results, with detailed model diagrams provided.

Baidu GRAB: Generative Ranking for Ads

The Baidu commercial tech team’s GRAB model replaces traditional DLRM pipelines with an end‑to‑end generative sequence model that jointly encodes user behavior and target ads in a unified representation space, inspired by LLM scaling laws and Transformer architecture. A novel Q‑Aware RAB causal attention mechanism introduces query‑aware relative bias for adaptive modeling of complex interactions and temporal signals. The paper details a two‑stage STS training algorithm to improve efficiency and mitigate over‑fitting, heterogeneous token representations for hot‑start, a dual‑loss stacking strategy, and KV‑Cache optimizations for high‑concurrency inference. Quantitative business gains after full deployment are reported.

Elasticsearch Vector Search and RAG Applications

The final part outlines how Elasticsearch can be leveraged for vector search and to build Retrieval‑Augmented Generation (RAG) applications, describing the indexing pipeline, query workflow, and integration points for large‑scale semantic retrieval.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ElasticSearchrecommendation systemslarge language modelAI searchAgentic RAGgenerative ranking
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.