Artificial Intelligence 9 min read

Scaling Huge Embedding Model Training with Cache-Enabled Distributed Framework (HET): VLDB 2022 Best Paper and Its Industrial Deployment

The award‑winning VLDB 2022 paper introduces HET, a cache‑enabled distributed framework that dramatically reduces communication overhead for sparse trillion‑parameter embedding models, and Tencent Ads has industrialized this technology to train 10 TB‑scale models with up to 7×24‑hour online deep learning.

Tencent Advertising Technology
Tencent Advertising Technology
Tencent Advertising Technology
Scaling Huge Embedding Model Training with Cache-Enabled Distributed Framework (HET): VLDB 2022 Best Paper and Its Industrial Deployment

At VLDB 2022, the joint Peking University‑Tencent Lab received the Best Scalable Data Science Paper award for "Scaling out Huge Embedding Model Training via Cache‑enabled Distributed Framework (HET)". The paper proposes a novel embedding‑cache training method that cuts communication cost and boosts overall training efficiency for sparse large‑scale models.

Inspired by HET, Tencent Ads co‑developed the AngelPS component on the Angel4.0 platform, integrating it into the Taiji Machine Learning platform. This enables single‑model processing limits up to the 10 TB level and supports 24/7 online deep learning and inference for massive advertising models.

HET’s core ideas include:

Embedding‑cache‑enabled hybrid communication architecture combining Parameter Server (PS) for sparse parameters and AllReduce for dense parameters.

A fine‑grained embedding clock and limited asynchronous protocol to keep cached replicas consistent without sacrificing convergence.

Experiments comparing HET with TensorFlow (PS‑only) and Parallax (PS + AllReduce) on datasets such as Criteo, GraphSAGE, Reddit, Amazon, and ogbn‑mag show 6.37–20.68× speedup and up to 88% reduction in sparse‑parameter communication, while maintaining comparable convergence.

Cache effectiveness tests reveal that allocating as little as 15% of total parameter memory achieves ~97% cache hit rate, with LFU outperforming LRU in miss reduction.

Scalability tests up to 32 nodes and 1 trillion‑parameter models (embedding dimension 4096) confirm HET’s superior performance over baseline systems.

The HET framework is open‑source (https://github.com/PKU-DAIR/Hetu), and the award‑winning research has been applied in production to train two trillion‑dimension models—Mix‑yuan AI model and an advertising model—enhancing recommendation efficiency and matching precision.

CacheDeep Learningembeddingdistributed trainingparameter serverLarge-Scale Models
Tencent Advertising Technology
Written by

Tencent Advertising Technology

Official hub of Tencent Advertising Technology, sharing the team's latest cutting-edge achievements and advertising technology applications.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.