Designing Next‑Gen Recommendation and Search Systems with Agentic Architectures
This article reviews cutting‑edge AI search and recommendation techniques—including Alibaba Cloud’s Agentic RAG, Huawei’s LLM‑enhanced recommendation pipeline, and Baidu’s generative ranking model GRAB—detailing their architectural evolution, multimodal retrieval strategies, performance benchmarks, and practical deployment insights.
The piece is a curated overview from the ebook Intelligent Agent Architecture and Practice: Building the Next‑Generation Recommendation and Search Systems , summarizing three technical case studies that illustrate how modern AI agents are reshaping high‑concurrency, multimodal search and recommendation workloads.
Alibaba Cloud AI Search: Agentic RAG
Based on a talk by Xing Shaomin, the article explains the challenges of handling massive concurrent queries, multimodal data, and multi‑hop reasoning. It describes the evolution from a single‑agent to a multi‑agent architecture that coordinates planning, retrieval, and generation modules. A multi‑path retrieval layer mixes vector, text, database, and graph recall to boost coverage and accuracy. The author also details GPU‑accelerated indexing and query quantization, and mentions extensions such as NL2SQL and multimodal search, with performance figures provided in the original material.
Huawei Noah’s Ark Lab: LLM‑Enhanced Recommendation
The author reviews the transition from deep‑learning‑based recommenders to large‑language‑model (LLM) and AI‑Agent approaches. Core challenges include noisy implicit feedback, limited semantic understanding, and difficulty extracting user intent. Using the KAR project as an example, the article outlines how factorized prompting and a multi‑expert knowledge adapter map semantic knowledge into the recommendation embedding space. Design trade‑offs for the multi‑expert network balance text feature dimensionality with real‑time constraints. Experimental results show an AUC lift of 1.5 % and online A/B‑test validation.
Baidu GRAB: Generative Ranking for Ads
The Baidu commercial tech team’s GRAB (Generative Ranking for Ads) replaces traditional feature‑engineered DLRM pipelines with an end‑to‑end generative sequence model that embeds user behavior and target ads in a unified space. Inspired by LLM scaling laws and Transformer architecture, the model introduces a Q‑Aware RAB causal attention mechanism to adaptively capture complex interactions and temporal signals. To address training efficiency and over‑fitting, a two‑stage STS training algorithm, heterogeneous token representations, and a dual‑loss stacking strategy are employed. KV‑Cache is used to sustain high‑throughput online inference, and the article reports quantified business gains after full deployment.
The ebook’s table of contents lists eight chapters covering multi‑agent interaction for AI‑for‑good, knowledge discovery with LLM agents, observability of OpenAI Swarm‑style systems, and practical guides for Elasticsearch‑based vector search and RAG applications, providing a comprehensive roadmap for building next‑generation AI‑driven recommendation and search platforms.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
