Designing Next‑Gen Recommendation and Search with Intelligent Agent Architecture
The article reviews a collection of technical chapters that analyze how multi‑agent AI architectures, large‑language‑model‑enhanced recommendation pipelines, generative ranking for ads, and Elasticsearch‑based vector RAG are applied to build next‑generation recommendation and search systems, citing concrete designs, performance numbers and real‑world deployments.
This piece introduces an ebook that gathers eight technical chapters on intelligent‑agent architectures and their application to modern recommendation and search systems.
Alibaba Cloud AI Search – Agentic RAG
The author summarizes a talk by Alibaba Cloud AI Search lead Xing Shaomin, describing challenges such as high concurrency, multimodal data, and multi‑hop queries. The solution evolves from a single‑agent to a multi‑agent framework that coordinates planning, retrieval, and generation modules to understand complex intents. A multi‑path retrieval layer mixes vector, text, database, and graph recall to improve coverage and accuracy, and GPU‑accelerated indexing and query quantization are compared, showing measurable speed‑up.
Huawei Noah Recommendation – LLM Integration
The chapter reviews the transition from deep‑learning recommenders to large‑language‑model (LLM) and AI‑Agent eras, highlighting problems of noisy implicit feedback, limited semantic understanding, and intent mining. It contrasts list‑based and conversational recommendation flows, and details the KAR project where factorized prompting and a multi‑expert knowledge adapter map semantic knowledge into the recommendation embedding space. The design balances text feature dimensionality with real‑time constraints, and an online A/B test reports a 1.5 % AUC lift.
Baidu GRAB – Generative Ranking for Ads
The Baidu commercial tech team’s GRAB model replaces traditional DLRM pipelines by end‑to‑end generative sequence modeling of user behavior and target ads, leveraging LLM scaling laws and Transformer architecture. A Q‑Aware RAB causal attention mechanism introduces query‑aware bias for adaptive modeling of complex interactions and temporal signals. The paper also explains the STS two‑stage training algorithm, heterogeneous token representations, a dual‑loss stacking strategy, and KV‑Cache optimizations for high‑concurrency inference, together with quantified business gains after full deployment.
Elasticsearch Vector Search and RAG
One chapter demonstrates how to use Elasticsearch for vector search and to build Retrieval‑Augmented Generation (RAG) applications, detailing index construction, query pipelines, and integration points.
Table of Contents
1. Multi‑agent interaction systems for AI‑for‑Good<br/>2. Knowledge discovery and data‑science with LLM agents<br/>3. Observability of OpenAI Swarm at SF Tech<br/>4. Huawei Noah: recommendation evolution and LLM practice<br/>5. GRAB: Baidu’s generative ad ranking model<br/>6. Elasticsearch vector search and RAG<br/>7. Alibaba Cloud AI Search Agentic RAG practice<br/>8. From big data to big models: frontier of search‑recommendation
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
