Tagged articles
18 articles
Page 1 of 1
JD Retail Technology
JD Retail Technology
Jan 30, 2024 · Artificial Intelligence

Next-Generation Multi‑GPU Synchronous Training Architecture for Large‑Scale Sparse Recommendation Models

The article details JD Retail's evolution from TensorFlow‑based sparse training to a custom high‑performance parameter server and a fully GPU‑accelerated, multi‑node, multi‑card synchronous training framework that leverages GPU‑RDMA, two‑level CPU‑DRAM/GPU‑HBM caching, and pipeline parallelism to overcome storage, I/O, and compute challenges of trillion‑parameter recommendation systems.

AI InfrastructureGPU AccelerationParameter Server
0 likes · 12 min read
Next-Generation Multi‑GPU Synchronous Training Architecture for Large‑Scale Sparse Recommendation Models
DataFunSummit
DataFunSummit
Nov 19, 2023 · Artificial Intelligence

Overview of NVIDIA Merlin for Recommendation Systems

This article introduces NVIDIA's Merlin suite, covering product overview, Merlin Models & Systems, the TensorFlow Distributed Embedding (TFDE) plugin, the Hierarchical‑KV library, and the Hierarchical Parameter Server (HPS), while highlighting their architecture, performance benefits, and ease of integration for large‑scale recommendation workloads.

Distributed EmbeddingGPU AccelerationHierarchical KV
0 likes · 13 min read
Overview of NVIDIA Merlin for Recommendation Systems
Kuaishou Tech
Kuaishou Tech
Aug 26, 2023 · Artificial Intelligence

PetPS: A Persistent‑Memory Parameter Server for Large‑Scale Embedding Models

PetPS introduces a persistent‑memory‑based parameter server that redesigns indexing with the PetHash hash table and offloads parameter aggregation to NIC Gathering, achieving up to 1.7× higher throughput and significantly lower latency for industrial‑scale embedding models in recommendation, search, and advertising workloads.

Parameter ServerPerformance OptimizationSystem Design
0 likes · 14 min read
PetPS: A Persistent‑Memory Parameter Server for Large‑Scale Embedding Models
AntTech
AntTech
Jul 12, 2023 · Artificial Intelligence

Hybrid Embedding Architecture for Large‑Scale Sparse CTR Models

This article describes the Hybrid Embedding solution proposed by Ant AI Infra to address storage, resource, and feature‑governance challenges of massive sparse CTR models, detailing its multi‑layer storage design, KV‑based parameter server, and performance gains in large‑scale recommendation systems.

AI InfraCTRHybrid Embedding
0 likes · 9 min read
Hybrid Embedding Architecture for Large‑Scale Sparse CTR Models
Tencent Advertising Technology
Tencent Advertising Technology
Nov 17, 2022 · Artificial Intelligence

Scaling Huge Embedding Model Training with Cache-Enabled Distributed Framework (HET): VLDB 2022 Best Paper and Its Industrial Deployment

The award‑winning VLDB 2022 paper introduces HET, a cache‑enabled distributed framework that dramatically reduces communication overhead for sparse trillion‑parameter embedding models, and Tencent Ads has industrialized this technology to train 10 TB‑scale models with up to 7×24‑hour online deep learning.

CacheDeep LearningEmbedding
0 likes · 9 min read
Scaling Huge Embedding Model Training with Cache-Enabled Distributed Framework (HET): VLDB 2022 Best Paper and Its Industrial Deployment
DataFunSummit
DataFunSummit
Sep 9, 2022 · Artificial Intelligence

Wuliang: Tencent's Deep Learning Framework for Real‑Time Large‑Scale Recommendation

The presentation by Tencent expert Yuan Yi details the Wuliang deep learning system for recommendation, covering its background, technical challenges such as massive data and real‑time requirements, the parameter‑server based solutions for training and inference, model compression techniques, and continuous online deployment strategies.

Deep LearningLarge-Scale TrainingParameter Server
0 likes · 14 min read
Wuliang: Tencent's Deep Learning Framework for Real‑Time Large‑Scale Recommendation
DataFunTalk
DataFunTalk
Jul 8, 2022 · Artificial Intelligence

Tencent's Wuliang Deep Learning System for Large‑Scale Recommendation: Architecture, Challenges, and Solutions

This article presents an in‑depth overview of Tencent's Wuliang deep learning platform for recommendation systems, detailing the real‑time data challenges, high‑throughput requirements, parameter‑server architecture, model compression techniques, multi‑level caching, and answers to common technical questions.

Distributed TrainingInference ServiceParameter Server
0 likes · 14 min read
Tencent's Wuliang Deep Learning System for Large‑Scale Recommendation: Architecture, Challenges, and Solutions
DataFunSummit
DataFunSummit
Apr 16, 2022 · Big Data

Angel Graph: A Scalable Graph Computing Platform – Architecture, Optimizations, and Applications

The article introduces Angel Graph, a large‑scale graph computing platform built on Angel's parameter‑server architecture and Spark, detailing its evolution, framework components (including Spark‑on‑Angel and PyTorch‑on‑Angel), data and model partitioning strategies, communication and computation optimizations, stability mechanisms, usability features, and real‑world applications across recommendation, risk control, social and gaming domains.

Parameter ServerPyTorchSpark
0 likes · 15 min read
Angel Graph: A Scalable Graph Computing Platform – Architecture, Optimizations, and Applications
Tencent Cloud Developer
Tencent Cloud Developer
Sep 1, 2021 · Artificial Intelligence

Why Distributed Machine Learning Accelerates AI Training at Scale

This article reviews how distributed machine learning tackles massive data and compute challenges by partitioning models and data across workers, optimizing communication with primitives, parameter servers, and Ring AllReduce, reducing IO overhead, and applying advanced optimizers such as LARS and LAMB to achieve faster, scalable training.

LAMB optimizerLARS optimizerParameter Server
0 likes · 31 min read
Why Distributed Machine Learning Accelerates AI Training at Scale
DataFunTalk
DataFunTalk
Oct 14, 2020 · Artificial Intelligence

Angel Machine Learning Platform: Architecture, Deep Learning Extensions, and Applications in Tencent Advertising Recommendation System

This article introduces Tencent's self‑built Angel distributed machine‑learning platform, describes its architecture and deep‑learning extensions (Parameter Server and AllReduce), explains how it powers the advertising recommendation pipeline with models such as DSSM, VLAD and YOLO, and presents extensive training‑level optimizations that yield multi‑fold performance improvements.

AngelParameter ServerPerformance Optimization
0 likes · 15 min read
Angel Machine Learning Platform: Architecture, Deep Learning Extensions, and Applications in Tencent Advertising Recommendation System
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 12, 2020 · Artificial Intelligence

Deepthought: An End‑to‑End Machine Learning Platform at iQIYI

Deepthought is iQIYI’s end‑to‑end machine‑learning platform that unifies distributed frameworks, decouples pipeline stages, integrates with Tongtian Tower, and offers visual drag‑and‑drop configuration, evolving from a fraud‑detection prototype to a generic system with real‑time inference, automated hyper‑parameter optimization, and support for large‑scale data across anti‑fraud, recommendation, and analytics workloads.

AI PlatformAutoMLParameter Server
0 likes · 13 min read
Deepthought: An End‑to‑End Machine Learning Platform at iQIYI
DataFunTalk
DataFunTalk
May 8, 2020 · Artificial Intelligence

Distributed Machine Learning Framework GDBT for High‑Dimensional Real‑Time Recommendation Systems

The article explains how the fourth paradigm's distributed machine learning framework GDBT tackles the massive data, high‑dimensional features, and real‑time requirements of modern recommendation systems by leveraging heterogeneous computing, parameter servers, RDMA networking, and optimized workloads.

GDBTParameter ServerRDMA
0 likes · 18 min read
Distributed Machine Learning Framework GDBT for High‑Dimensional Real‑Time Recommendation Systems
DataFunTalk
DataFunTalk
Mar 22, 2019 · Artificial Intelligence

Understanding Alibaba’s “Image Matters” Paper: Deep Image CTR Model (DICM) and Advanced Model Server

This article interprets Alibaba’s “Image Matters” paper, explaining how the Deep Image CTR Model (DICM) introduces user‑side visual preference modeling with image embeddings, why traditional Parameter Servers struggle with large image vectors, and how the Advanced Model Server (AMS) compresses embeddings to enable efficient distributed training.

Advanced Model ServerCTRDeep Learning
0 likes · 15 min read
Understanding Alibaba’s “Image Matters” Paper: Deep Image CTR Model (DICM) and Advanced Model Server
Didi Tech
Didi Tech
Jun 8, 2018 · Artificial Intelligence

DiDi PS: High-Performance RDMA-Based Parameter Server for Distributed Deep Learning

DiDi PS is a custom RDMA‑based parameter server that uses a ring topology and optimized ibverbs communication to dramatically accelerate distributed deep‑learning training, consistently outperforming OpenMPI, NCCL2, TensorFlow’s built‑in RDMA, and Horovod while providing more stable and scalable synchronization for massive data workloads.

AllreduceDistributed TrainingParameter Server
0 likes · 10 min read
DiDi PS: High-Performance RDMA-Based Parameter Server for Distributed Deep Learning
Alibaba Cloud Developer
Alibaba Cloud Developer
Apr 26, 2018 · Artificial Intelligence

How TensorFlowRS Supercharges Large‑Scale Search & Recommendation with 10×‑100× Speedups

This article describes TensorFlowRS, an Alibaba‑built extension of TensorFlow that tackles the massive compute and sparse‑feature challenges of search, advertising and recommendation by redesigning the parameter server, adding fail‑over, gradient‑compensation, online‑learning support, advanced training modes and visualisation, achieving up to 100× training speedup and improved model quality.

Distributed TrainingOnline LearningParameter Server
0 likes · 16 min read
How TensorFlowRS Supercharges Large‑Scale Search & Recommendation with 10×‑100× Speedups
21CTO
21CTO
Aug 13, 2017 · Artificial Intelligence

How Distributed Machine Learning Platforms Compare: Spark, PMLS, TensorFlow

This article surveys distributed machine‑learning platforms, classifies them into basic data‑flow, parameter‑server, and advanced data‑flow models, examines Spark, PMLS (Petuum), TensorFlow and MXNet, presents performance comparisons on EC2 instances, and discusses bottlenecks, fault tolerance, and future research directions.

Parameter ServerPerformance EvaluationSpark
0 likes · 12 min read
How Distributed Machine Learning Platforms Compare: Spark, PMLS, TensorFlow
High Availability Architecture
High Availability Architecture
Aug 2, 2017 · Artificial Intelligence

A Comparative Study of Distributed Machine Learning Platforms: Design Methods and Evaluation

This article surveys design approaches for distributed machine learning platforms, classifies them into basic dataflow, parameter‑server, and advanced dataflow models, examines examples such as Spark, PMLS, TensorFlow and MXNet, and presents performance evaluations and future research directions.

Parameter ServerPerformance EvaluationSpark
0 likes · 10 min read
A Comparative Study of Distributed Machine Learning Platforms: Design Methods and Evaluation