Artificial Intelligence 15 min read

Angel Machine Learning Platform: Architecture, Deep Learning Extensions, and Applications in Tencent Advertising Recommendation System

This article introduces Tencent's self‑built Angel distributed machine‑learning platform, describes its architecture and deep‑learning extensions (Parameter Server and AllReduce), explains how it powers the advertising recommendation pipeline with models such as DSSM, VLAD and YOLO, and presents extensive training‑level optimizations that yield multi‑fold performance improvements.

DataFunTalk

Oct 14, 2020

Angel Machine Learning Platform: Architecture, Deep Learning Extensions, and Applications in Tencent Advertising Recommendation System

Angel is Tencent's self‑developed high‑performance distributed machine‑learning platform that supports traditional machine learning, deep learning, graph computing, and federated learning scenarios. It follows a Parameter Server architecture and provides a full‑stack solution covering feature engineering, model training, serving, and hyper‑parameter tuning.

The platform consists of four layers: the core layer (Angel‑Core) with PSAgent, PSServer, Work, Network and Storage; the machine‑learning layer (Angel‑ML) offering basic data types and user‑extendable functions; the client layer (Angel‑Client) for plug‑in extensions such as TensorFlow and PyTorch; and the algorithm layer (Angel‑MLLib) that ships ready‑made algorithms like GBDT and SVM.

For deep‑learning workloads Angel implements two common distributed training paradigms: Parameter Server (PS) and MPI AllReduce. In the PS mode Angel launches a PS process that holds model parameters while workers run native C++ APIs to invoke TensorFlow or PyTorch operators. In the AllReduce mode Angel starts a controller process that spawns a training process on each worker, allowing users to adopt existing deep‑learning code with minimal changes.

Tencent's advertising recommendation system processes massive real‑time and offline data streams, extracts user and item features, performs coarse and fine ranking, and serves the top ads to users. The system faces challenges such as billions of ID features, rapid data turnover, and diverse model types (ranking, image detection, OCR).

Key models used include:

DSSM (Deep Structured Semantic Model) that maps queries and items into low‑dimensional vectors and computes relevance via cosine similarity.

VLAD/NetVLAD/NeXtVLAD for measuring similarity between ad images, with NeXtVLAD introducing a differentiable assignment function for better distance estimation.

YOLO‑V3 for OCR front‑end detection, processing large‑size images (e.g., 608×608) and incurring significant loss‑computation cost.

Training and optimization efforts focus on three aspects:

Data‑flow optimization: The original single‑pipeline "ZhiLing" framework was refactored into a multi‑pipeline design, assigning a dedicated DataQueue to each GPU worker and using Angel PS (AllReduce) to coordinate, thereby eliminating I/O bottlenecks.

Embedding lookup optimization: Hashing is performed before SparseFillEmptyRows, reducing costly string operations on millions of rows and yielding ~6% single‑GPU speedup.

Model‑level loss optimization: For YOLO‑V3, loss calculation is restricted to diagonal‑symmetric blocks of the feature map, drastically cutting the number of IoU computations and delivering ~10% single‑GPU improvement.

These optimizations lead to substantial performance gains: DSSM achieves a 33× speedup, VLAD a 22× speedup, and YOLO‑V3 a 2.5× speedup on a single GPU, with near‑linear scaling when multiple GPUs or machines are added.

In summary, the talk covered (1) the Angel platform and its deep‑learning extensions, (2) the characteristics and models of Tencent's advertising recommendation system, and (3) concrete training‑level optimizations that dramatically improve throughput and latency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization recommendation system Tencent distributed machine learning Parameter Server Angel

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.