Tagged articles

18 articles

Page 1 of 1

Jan 30, 2024 · Artificial Intelligence

Next-Generation Multi‑GPU Synchronous Training Architecture for Large‑Scale Sparse Recommendation Models

The article details JD Retail's evolution from TensorFlow‑based sparse training to a custom high‑performance parameter server and a fully GPU‑accelerated, multi‑node, multi‑card synchronous training framework that leverages GPU‑RDMA, two‑level CPU‑DRAM/GPU‑HBM caching, and pipeline parallelism to overcome storage, I/O, and compute challenges of trillion‑parameter recommendation systems.

AI InfrastructureGPU AccelerationParameter Server

0 likes · 12 min read

Next-Generation Multi‑GPU Synchronous Training Architecture for Large‑Scale Sparse Recommendation Models

DataFunSummit

Nov 19, 2023 · Artificial Intelligence

Overview of NVIDIA Merlin for Recommendation Systems

This article introduces NVIDIA's Merlin suite, covering product overview, Merlin Models & Systems, the TensorFlow Distributed Embedding (TFDE) plugin, the Hierarchical‑KV library, and the Hierarchical Parameter Server (HPS), while highlighting their architecture, performance benefits, and ease of integration for large‑scale recommendation workloads.

Distributed EmbeddingGPU AccelerationHierarchical KV

0 likes · 13 min read

Overview of NVIDIA Merlin for Recommendation Systems

Kuaishou Tech

Aug 26, 2023 · Artificial Intelligence

PetPS: A Persistent‑Memory Parameter Server for Large‑Scale Embedding Models

PetPS introduces a persistent‑memory‑based parameter server that redesigns indexing with the PetHash hash table and offloads parameter aggregation to NIC Gathering, achieving up to 1.7× higher throughput and significantly lower latency for industrial‑scale embedding models in recommendation, search, and advertising workloads.

Parameter ServerPerformance OptimizationSystem Design

0 likes · 14 min read

PetPS: A Persistent‑Memory Parameter Server for Large‑Scale Embedding Models

AntTech

Jul 12, 2023 · Artificial Intelligence

Hybrid Embedding Architecture for Large‑Scale Sparse CTR Models

This article describes the Hybrid Embedding solution proposed by Ant AI Infra to address storage, resource, and feature‑governance challenges of massive sparse CTR models, detailing its multi‑layer storage design, KV‑based parameter server, and performance gains in large‑scale recommendation systems.

AI InfraCTRHybrid Embedding

0 likes · 9 min read

Hybrid Embedding Architecture for Large‑Scale Sparse CTR Models

Tencent Advertising Technology

Nov 17, 2022 · Artificial Intelligence

Scaling Huge Embedding Model Training with Cache-Enabled Distributed Framework (HET): VLDB 2022 Best Paper and Its Industrial Deployment

The award‑winning VLDB 2022 paper introduces HET, a cache‑enabled distributed framework that dramatically reduces communication overhead for sparse trillion‑parameter embedding models, and Tencent Ads has industrialized this technology to train 10 TB‑scale models with up to 7×24‑hour online deep learning.

CacheDeep LearningEmbedding

0 likes · 9 min read

Scaling Huge Embedding Model Training with Cache-Enabled Distributed Framework (HET): VLDB 2022 Best Paper and Its Industrial Deployment

DataFunSummit

Sep 9, 2022 · Artificial Intelligence

Wuliang: Tencent's Deep Learning Framework for Real‑Time Large‑Scale Recommendation

The presentation by Tencent expert Yuan Yi details the Wuliang deep learning system for recommendation, covering its background, technical challenges such as massive data and real‑time requirements, the parameter‑server based solutions for training and inference, model compression techniques, and continuous online deployment strategies.

Deep LearningLarge-Scale TrainingParameter Server

0 likes · 14 min read

Wuliang: Tencent's Deep Learning Framework for Real‑Time Large‑Scale Recommendation

DataFunTalk

Jul 8, 2022 · Artificial Intelligence

Tencent's Wuliang Deep Learning System for Large‑Scale Recommendation: Architecture, Challenges, and Solutions

This article presents an in‑depth overview of Tencent's Wuliang deep learning platform for recommendation systems, detailing the real‑time data challenges, high‑throughput requirements, parameter‑server architecture, model compression techniques, multi‑level caching, and answers to common technical questions.

Distributed TrainingInference ServiceParameter Server

0 likes · 14 min read

Tencent's Wuliang Deep Learning System for Large‑Scale Recommendation: Architecture, Challenges, and Solutions

DataFunSummit

Apr 16, 2022 · Big Data

Angel Graph: A Scalable Graph Computing Platform – Architecture, Optimizations, and Applications

The article introduces Angel Graph, a large‑scale graph computing platform built on Angel's parameter‑server architecture and Spark, detailing its evolution, framework components (including Spark‑on‑Angel and PyTorch‑on‑Angel), data and model partitioning strategies, communication and computation optimizations, stability mechanisms, usability features, and real‑world applications across recommendation, risk control, social and gaming domains.

Parameter ServerPyTorchSpark

0 likes · 15 min read

Angel Graph: A Scalable Graph Computing Platform – Architecture, Optimizations, and Applications

Tencent Cloud Developer

Sep 1, 2021 · Artificial Intelligence

Why Distributed Machine Learning Accelerates AI Training at Scale

This article reviews how distributed machine learning tackles massive data and compute challenges by partitioning models and data across workers, optimizing communication with primitives, parameter servers, and Ring AllReduce, reducing IO overhead, and applying advanced optimizers such as LARS and LAMB to achieve faster, scalable training.

LAMB optimizerLARS optimizerParameter Server

0 likes · 31 min read

Why Distributed Machine Learning Accelerates AI Training at Scale

DataFunTalk

Oct 14, 2020 · Artificial Intelligence

Angel Machine Learning Platform: Architecture, Deep Learning Extensions, and Applications in Tencent Advertising Recommendation System

This article introduces Tencent's self‑built Angel distributed machine‑learning platform, describes its architecture and deep‑learning extensions (Parameter Server and AllReduce), explains how it powers the advertising recommendation pipeline with models such as DSSM, VLAD and YOLO, and presents extensive training‑level optimizations that yield multi‑fold performance improvements.

AngelParameter ServerPerformance Optimization

0 likes · 15 min read

Angel Machine Learning Platform: Architecture, Deep Learning Extensions, and Applications in Tencent Advertising Recommendation System

iQIYI Technical Product Team

Jun 12, 2020 · Artificial Intelligence

Deepthought: An End‑to‑End Machine Learning Platform at iQIYI

Deepthought is iQIYI’s end‑to‑end machine‑learning platform that unifies distributed frameworks, decouples pipeline stages, integrates with Tongtian Tower, and offers visual drag‑and‑drop configuration, evolving from a fraud‑detection prototype to a generic system with real‑time inference, automated hyper‑parameter optimization, and support for large‑scale data across anti‑fraud, recommendation, and analytics workloads.

AI PlatformAutoMLParameter Server

0 likes · 13 min read

Deepthought: An End‑to‑End Machine Learning Platform at iQIYI

DataFunTalk

May 8, 2020 · Artificial Intelligence

Distributed Machine Learning Framework GDBT for High‑Dimensional Real‑Time Recommendation Systems

The article explains how the fourth paradigm's distributed machine learning framework GDBT tackles the massive data, high‑dimensional features, and real‑time requirements of modern recommendation systems by leveraging heterogeneous computing, parameter servers, RDMA networking, and optimized workloads.

GDBTParameter ServerRDMA

0 likes · 18 min read

Distributed Machine Learning Framework GDBT for High‑Dimensional Real‑Time Recommendation Systems

Architecture Digest

Apr 5, 2019 · Fundamentals

An Overview of Recent Developments and Practical Topics in Distributed Systems

This article provides a comprehensive introduction to modern distributed systems, covering recent research trends, practical technologies such as Paxos, Consistent Hashing, MapReduce, Spark, various storage and computing paradigms, and offers guidance for beginners on how to navigate the field.

MapReduceParameter ServerPaxos

0 likes · 18 min read

An Overview of Recent Developments and Practical Topics in Distributed Systems

DataFunTalk

Mar 22, 2019 · Artificial Intelligence

Understanding Alibaba’s “Image Matters” Paper: Deep Image CTR Model (DICM) and Advanced Model Server

This article interprets Alibaba’s “Image Matters” paper, explaining how the Deep Image CTR Model (DICM) introduces user‑side visual preference modeling with image embeddings, why traditional Parameter Servers struggle with large image vectors, and how the Advanced Model Server (AMS) compresses embeddings to enable efficient distributed training.

Advanced Model ServerCTRDeep Learning

0 likes · 15 min read

Understanding Alibaba’s “Image Matters” Paper: Deep Image CTR Model (DICM) and Advanced Model Server

Didi Tech

Jun 8, 2018 · Artificial Intelligence

DiDi PS: High-Performance RDMA-Based Parameter Server for Distributed Deep Learning

DiDi PS is a custom RDMA‑based parameter server that uses a ring topology and optimized ibverbs communication to dramatically accelerate distributed deep‑learning training, consistently outperforming OpenMPI, NCCL2, TensorFlow’s built‑in RDMA, and Horovod while providing more stable and scalable synchronization for massive data workloads.

AllreduceDistributed TrainingParameter Server

0 likes · 10 min read

DiDi PS: High-Performance RDMA-Based Parameter Server for Distributed Deep Learning

Alibaba Cloud Developer

Apr 26, 2018 · Artificial Intelligence

How TensorFlowRS Supercharges Large‑Scale Search & Recommendation with 10×‑100× Speedups

This article describes TensorFlowRS, an Alibaba‑built extension of TensorFlow that tackles the massive compute and sparse‑feature challenges of search, advertising and recommendation by redesigning the parameter server, adding fail‑over, gradient‑compensation, online‑learning support, advanced training modes and visualisation, achieving up to 100× training speedup and improved model quality.

Distributed TrainingOnline LearningParameter Server

0 likes · 16 min read

How TensorFlowRS Supercharges Large‑Scale Search & Recommendation with 10×‑100× Speedups

21CTO

Aug 13, 2017 · Artificial Intelligence

How Distributed Machine Learning Platforms Compare: Spark, PMLS, TensorFlow

This article surveys distributed machine‑learning platforms, classifies them into basic data‑flow, parameter‑server, and advanced data‑flow models, examines Spark, PMLS (Petuum), TensorFlow and MXNet, presents performance comparisons on EC2 instances, and discusses bottlenecks, fault tolerance, and future research directions.

Parameter ServerPerformance EvaluationSpark

0 likes · 12 min read

How Distributed Machine Learning Platforms Compare: Spark, PMLS, TensorFlow

High Availability Architecture

Aug 2, 2017 · Artificial Intelligence

A Comparative Study of Distributed Machine Learning Platforms: Design Methods and Evaluation

This article surveys design approaches for distributed machine learning platforms, classifies them into basic dataflow, parameter‑server, and advanced dataflow models, examines examples such as Spark, PMLS, TensorFlow and MXNet, and presents performance evaluations and future research directions.

Parameter ServerPerformance EvaluationSpark

0 likes · 10 min read

A Comparative Study of Distributed Machine Learning Platforms: Design Methods and Evaluation