Artificial Intelligence 17 min read

Building Large‑Scale Recommendation Systems with Big Data and Large Language Models on Alibaba Cloud AI Platform

This presentation details how Alibaba Cloud's AI platform integrates big‑data pipelines, feature‑store services, and large language model capabilities to construct high‑performance search‑recommendation architectures, covering system design, training and inference optimizations, LLM‑driven use cases, and open‑source RAG tooling.

DataFunSummit
DataFunSummit
DataFunSummit
Building Large‑Scale Recommendation Systems with Big Data and Large Language Models on Alibaba Cloud AI Platform

Shi Xing, product architecture lead of Alibaba Cloud Machine Learning Platform PAI, introduces the team’s work on search‑recommendation, multimodal image/video processing, large‑model platform optimization, and RAG engineering.

The talk outlines the mature search‑recommendation advertising architecture used by major e‑commerce apps, describing how user requests trigger business engines, A/B systems, recall, coarse‑ranking and fine‑ranking, and how models such as DeepFM rely on feature platforms and real‑time data streams.

At the resource layer, Alibaba Cloud provides heterogeneous compute (CPU, GPU, high‑bandwidth RDMA network, high‑performance storage) via clusters like ODPS Feitian, container services, and the Lingjun intelligent compute cluster, forming the foundation for a unified "big data + AI" PaaS platform.

The big‑data platform comprises MaxCompute (Hadoop‑like), Hologres (real‑time OLAP), Flink for streaming, and EMR as the open‑source counterpart, enabling both real‑time and offline data processing.

The AI platform offers data labeling (PAI‑iTAG), data cleaning, a FeatureStore, interactive development (PAI‑DSW), visual development (PAI‑Designer), distributed training (PAI‑DLC), dataset acceleration, network and operator optimizations, and model serving via PAI‑EAS.

FeatureStore serves as a structured big‑data management layer for ML features, providing offline‑to‑online synchronization, feature lineage, multi‑level caching, and low‑latency access, crucial for serving embeddings that can reach terabyte scale.

EasyRec, an open‑source recommendation algorithm library, unifies training and inference across MaxCompute, Hadoop, Spark, and local environments, supporting auto‑ML, feature generation, model distillation, and early‑stop to reduce development effort.

Training optimizations include multi‑level caching, feature auto‑eviction, WorkQueue data pipelines, feature selection, knowledge distillation, communication reduction, and hardware acceleration (AVX/AMX, AllReduce, SOK). Inference optimizations comprise AVX/AMX acceleration, bf16/int8 quantization, AutoPlacement, SessionGroup multi‑stream GPU usage, and feature caching, achieving up to four‑fold QPS improvement over native TF‑Serving.

LLM‑driven scenarios such as category recommendation, query rewriting, and prompt engineering are demonstrated, showing how large language models can generate related product categories, rewrite ambiguous queries, and produce domain‑specific synonyms.

The RAG workflow (PAI‑RAG) modularizes document extraction, indexing, pre‑retrieval query rewriting, retrieval, post‑retrieval processing, generation, and evaluation, supporting multimodal inputs, OCR, and vector databases (ElasticSearch, Hologres, Milvus), with tools for building evaluation sets via RefGPT.

Open‑source projects like EasyRec, EasyPhoto, EasyAnimate, and PAI‑RAG are highlighted, inviting the community to contribute and use these tools for large‑scale AI applications.

Big DataLarge Language ModelsRAGRecommendation systemsdistributed trainingAI PlatformFeature Store
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.