Designing Next‑Gen Recommendation and Search with Agentic Architectures
This article reviews cutting‑edge AI search and recommendation techniques—including Alibaba Cloud's Agentic RAG, Huawei Noah's LLM‑enhanced recommender, and Baidu's generative ranking model GRAB—detailing their architectures, multi‑modal retrieval strategies, performance gains, and practical deployment insights.
The piece begins with an in‑depth look at Alibaba Cloud AI Search’s Agentic RAG implementation. Facing high‑concurrency, multimodal data, and multi‑hop query scenarios, the system evolves from a single‑agent to a multi‑agent architecture that coordinates planning, retrieval, and generation modules. A mixed‑recall pipeline combines vector, textual, database, and graph sources to boost coverage and accuracy, while GPU‑accelerated indexing and query quantization are compared for speed‑up. Extensions such as NL2SQL and multimodal search are also described, with references to full architecture diagrams and performance data.
Next, the article examines Huawei Noah’s recommendation‑system evolution from deep learning to large language models (LLMs) and AI agents. It identifies core challenges—noisy implicit feedback, weak semantic understanding, and difficulty extracting user intent—and contrasts list‑based and conversational recommendation flows. The KAR project demonstrates how factorized prompting and a multi‑expert knowledge adapter embed semantic knowledge into the recommendation embedding space. Design details show how the adapter balances text‑feature dimensionality with real‑time constraints, and experimental results report a 1.5% AUC lift and supporting online A/B‑test figures.
The third section details Baidu’s GRAB (Generative Ranking for Ads) model, which replaces traditional DLRM pipelines that rely on massive discrete features and manual feature engineering. Inspired by LLM scaling laws and Transformer architecture, GRAB models user behavior and target ads in a unified representation space via end‑to‑end sequence generation. The Q‑Aware RAB causal attention mechanism introduces query‑aware relative bias for adaptive modeling of complex interactions and temporal signals. Additional innovations include a two‑stage STS training algorithm to improve efficiency and mitigate over‑fitting, heterogeneous token representations with a dual‑loss stacking strategy, and KV‑Cache optimizations for high‑throughput online inference, all backed by quantified business gains.
Finally, the article lists the eight chapters of the ebook "Agent Architecture and Practice: Building the Next‑Generation Recommendation and Search Systems," covering topics such as multi‑agent interaction for AI‑for‑good, LLM‑driven knowledge discovery, observability of OpenAI Swarm, Huawei Noah’s recommendation evolution, Baidu GRAB, Elasticsearch vector search with RAG, Alibaba Cloud Agentic RAG, and frontier explorations from big data to large models.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
