EasyRec Recommendation Algorithm Training and Inference Optimization
EasyRec, Alibaba Cloud’s modular recommendation framework, unifies configurable data, embedding, dense, and output layers on MaxCompute, EMR, and DLC, and speeds training with deduplication, EmbeddingParallel sharding, lock‑free hash tables, GPU embeddings, and AMX BF16, while inference benefits from operator fusion, low‑precision AVX/AMX kernels, compact caches, batch merging, and network compression, enabling real‑time online learning and delivering higher recommendation quality at lower cost in e‑commerce.
This article presents EasyRec, Alibaba Cloud's recommendation algorithm framework, detailing its training and inference architecture and a series of performance optimizations.
EasyRec's overall framework consists of data layer, embedding layer, dense layer, and output layer, supporting deployment on MaxCompute, EMR, and DLC platforms, with features such as configurability, componentization, distributed training, ODL, automatic hyperparameter tuning via NNI, and fault‑tolerant training recovery using Work Queue.
Training optimizations discussed include sequence feature deduplication to reduce batch size effective data, EmbeddingParallel sharding that stores dense parameters via All‑Reduce and sparse parameters via AllToAll, lock‑free hash tables on CPU, hugectr sok embedding on GPU, AMX‑based BF16 matrix acceleration for dense layers, and various CPU‑side optimizations for MatMul.
Inference optimizations cover operator fusion (e.g., merging unique and SparseSegmentMean via AVX), BF16/FP16 precision acceleration with AVX and AMX, AVX‑accelerated StringSplit and hash functions, compact item feature cache reducing memory by >80%, TensorFlow op encapsulation for overlapping feature generation and embedding lookup, user‑feature tiling to avoid redundant computation, GPU placement strategies using Min‑Cut to split embedding lookup and dense computation, XLA‑based operator fusion to cut kernel launch overhead, TRT for dense‑layer optimization, batch‑size merging for small‑batch scenarios, and network‑level improvements such as direct pod IP connection and request compression with Snappy/Zstd.
The article also describes online learning capabilities: real‑time log回流 via PAI‑REC to SLS and Datahub, Flink‑based sample aggregation and label generation, incremental parameter saving to OSS, feature consistency via埋点, Lz4 compression, and handling of delayed or abnormal data.
Finally, a case study in e‑commerce shows that these optimizations jointly improve recommendation effectiveness while significantly lowering cost.
Sohu Tech Products
A knowledge-sharing platform for Sohu's technology products. As a leading Chinese internet brand with media, video, search, and gaming services and over 700 million users, Sohu continuously drives tech innovation and practice. We’ll share practical insights and tech news here.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.