Tag

parameter server

0 views collected around this technical thread.

JD Retail Technology
JD Retail Technology
Jan 30, 2024 · Artificial Intelligence

Next-Generation Multi‑GPU Synchronous Training Architecture for Large‑Scale Sparse Recommendation Models

The article details JD Retail's evolution from TensorFlow‑based sparse training to a custom high‑performance parameter server and a fully GPU‑accelerated, multi‑node, multi‑card synchronous training framework that leverages GPU‑RDMA, two‑level CPU‑DRAM/GPU‑HBM caching, and pipeline parallelism to overcome storage, I/O, and compute challenges of trillion‑parameter recommendation systems.

AI infrastructureGPU AccelerationRecommendation systems
0 likes · 12 min read
Next-Generation Multi‑GPU Synchronous Training Architecture for Large‑Scale Sparse Recommendation Models
DataFunSummit
DataFunSummit
Nov 19, 2023 · Artificial Intelligence

Overview of NVIDIA Merlin for Recommendation Systems

This article introduces NVIDIA's Merlin suite, covering product overview, Merlin Models & Systems, the TensorFlow Distributed Embedding (TFDE) plugin, the Hierarchical‑KV library, and the Hierarchical Parameter Server (HPS), while highlighting their architecture, performance benefits, and ease of integration for large‑scale recommendation workloads.

Distributed EmbeddingGPU AccelerationHierarchical KV
0 likes · 13 min read
Overview of NVIDIA Merlin for Recommendation Systems
Kuaishou Tech
Kuaishou Tech
Aug 26, 2023 · Artificial Intelligence

PetPS: A Persistent‑Memory Parameter Server for Large‑Scale Embedding Models

PetPS introduces a persistent‑memory‑based parameter server that redesigns indexing with the PetHash hash table and offloads parameter aggregation to NIC Gathering, achieving up to 1.7× higher throughput and significantly lower latency for industrial‑scale embedding models in recommendation, search, and advertising workloads.

Embedding ModelsPerformance OptimizationPersistent Memory
0 likes · 14 min read
PetPS: A Persistent‑Memory Parameter Server for Large‑Scale Embedding Models
AntTech
AntTech
Jul 12, 2023 · Artificial Intelligence

Hybrid Embedding Architecture for Large‑Scale Sparse CTR Models

This article describes the Hybrid Embedding solution proposed by Ant AI Infra to address storage, resource, and feature‑governance challenges of massive sparse CTR models, detailing its multi‑layer storage design, KV‑based parameter server, and performance gains in large‑scale recommendation systems.

AI InfraHybrid Embeddingctr
0 likes · 9 min read
Hybrid Embedding Architecture for Large‑Scale Sparse CTR Models
Tencent Advertising Technology
Tencent Advertising Technology
Nov 17, 2022 · Artificial Intelligence

Scaling Huge Embedding Model Training with Cache-Enabled Distributed Framework (HET): VLDB 2022 Best Paper and Its Industrial Deployment

The award‑winning VLDB 2022 paper introduces HET, a cache‑enabled distributed framework that dramatically reduces communication overhead for sparse trillion‑parameter embedding models, and Tencent Ads has industrialized this technology to train 10 TB‑scale models with up to 7×24‑hour online deep learning.

Large-Scale Modelscachedeep learning
0 likes · 9 min read
Scaling Huge Embedding Model Training with Cache-Enabled Distributed Framework (HET): VLDB 2022 Best Paper and Its Industrial Deployment
DataFunSummit
DataFunSummit
Sep 9, 2022 · Artificial Intelligence

Wuliang: Tencent's Deep Learning Framework for Real‑Time Large‑Scale Recommendation

The presentation by Tencent expert Yuan Yi details the Wuliang deep learning system for recommendation, covering its background, technical challenges such as massive data and real‑time requirements, the parameter‑server based solutions for training and inference, model compression techniques, and continuous online deployment strategies.

Large-Scale TrainingRecommendation systemsdeep learning
0 likes · 14 min read
Wuliang: Tencent's Deep Learning Framework for Real‑Time Large‑Scale Recommendation
DataFunTalk
DataFunTalk
Jul 8, 2022 · Artificial Intelligence

Tencent's Wuliang Deep Learning System for Large‑Scale Recommendation: Architecture, Challenges, and Solutions

This article presents an in‑depth overview of Tencent's Wuliang deep learning platform for recommendation systems, detailing the real‑time data challenges, high‑throughput requirements, parameter‑server architecture, model compression techniques, multi‑level caching, and answers to common technical questions.

Inference ServiceRecommendation systemsdeep learning
0 likes · 14 min read
Tencent's Wuliang Deep Learning System for Large‑Scale Recommendation: Architecture, Challenges, and Solutions
DataFunSummit
DataFunSummit
Apr 16, 2022 · Big Data

Angel Graph: A Scalable Graph Computing Platform – Architecture, Optimizations, and Applications

The article introduces Angel Graph, a large‑scale graph computing platform built on Angel's parameter‑server architecture and Spark, detailing its evolution, framework components (including Spark‑on‑Angel and PyTorch‑on‑Angel), data and model partitioning strategies, communication and computation optimizations, stability mechanisms, usability features, and real‑world applications across recommendation, risk control, social and gaming domains.

Big DataDistributed SystemsOptimization
0 likes · 15 min read
Angel Graph: A Scalable Graph Computing Platform – Architecture, Optimizations, and Applications
DataFunTalk
DataFunTalk
Apr 10, 2022 · Big Data

Angel Graph: A Large-Scale Graph Computing Platform by Tencent

This article introduces Tencent's Angel Graph platform, detailing its evolution from early versions to a mature large‑scale graph computing system, its architecture combining Angel PS with Spark and PyTorch, data and model partitioning strategies, communication and computation optimizations, stability features, usability, and real‑world applications.

Angel GraphSparkgraph computing
0 likes · 15 min read
Angel Graph: A Large-Scale Graph Computing Platform by Tencent
DataFunTalk
DataFunTalk
Oct 14, 2020 · Artificial Intelligence

Angel Machine Learning Platform: Architecture, Deep Learning Extensions, and Applications in Tencent Advertising Recommendation System

This article introduces Tencent's self‑built Angel distributed machine‑learning platform, describes its architecture and deep‑learning extensions (Parameter Server and AllReduce), explains how it powers the advertising recommendation pipeline with models such as DSSM, VLAD and YOLO, and presents extensive training‑level optimizations that yield multi‑fold performance improvements.

AngelPerformance OptimizationTencent
0 likes · 15 min read
Angel Machine Learning Platform: Architecture, Deep Learning Extensions, and Applications in Tencent Advertising Recommendation System
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 12, 2020 · Artificial Intelligence

Deepthought: An End‑to‑End Machine Learning Platform at iQIYI

Deepthought is iQIYI’s end‑to‑end machine‑learning platform that unifies distributed frameworks, decouples pipeline stages, integrates with Tongtian Tower, and offers visual drag‑and‑drop configuration, evolving from a fraud‑detection prototype to a generic system with real‑time inference, automated hyper‑parameter optimization, and support for large‑scale data across anti‑fraud, recommendation, and analytics workloads.

AI PlatformAutoMLData Engineering
0 likes · 13 min read
Deepthought: An End‑to‑End Machine Learning Platform at iQIYI
DataFunTalk
DataFunTalk
May 8, 2020 · Artificial Intelligence

Distributed Machine Learning Framework GDBT for High‑Dimensional Real‑Time Recommendation Systems

The article explains how the fourth paradigm's distributed machine learning framework GDBT tackles the massive data, high‑dimensional features, and real‑time requirements of modern recommendation systems by leveraging heterogeneous computing, parameter servers, RDMA networking, and optimized workloads.

GDBTRDMARecommendation systems
0 likes · 18 min read
Distributed Machine Learning Framework GDBT for High‑Dimensional Real‑Time Recommendation Systems
Architecture Digest
Architecture Digest
Apr 5, 2019 · Fundamentals

An Overview of Recent Developments and Practical Topics in Distributed Systems

This article provides a comprehensive introduction to modern distributed systems, covering recent research trends, practical technologies such as Paxos, Consistent Hashing, MapReduce, Spark, various storage and computing paradigms, and offers guidance for beginners on how to navigate the field.

Big DataDistributed SystemsMapReduce
0 likes · 18 min read
An Overview of Recent Developments and Practical Topics in Distributed Systems
DataFunTalk
DataFunTalk
Mar 22, 2019 · Artificial Intelligence

Understanding Alibaba’s “Image Matters” Paper: Deep Image CTR Model (DICM) and Advanced Model Server

This article interprets Alibaba’s “Image Matters” paper, explaining how the Deep Image CTR Model (DICM) introduces user‑side visual preference modeling with image embeddings, why traditional Parameter Servers struggle with large image vectors, and how the Advanced Model Server (AMS) compresses embeddings to enable efficient distributed training.

Advanced Model ServerImage EmbeddingRecommendation systems
0 likes · 15 min read
Understanding Alibaba’s “Image Matters” Paper: Deep Image CTR Model (DICM) and Advanced Model Server
Didi Tech
Didi Tech
Jun 8, 2018 · Artificial Intelligence

DiDi PS: High-Performance RDMA-Based Parameter Server for Distributed Deep Learning

DiDi PS is a custom RDMA‑based parameter server that uses a ring topology and optimized ibverbs communication to dramatically accelerate distributed deep‑learning training, consistently outperforming OpenMPI, NCCL2, TensorFlow’s built‑in RDMA, and Horovod while providing more stable and scalable synchronization for massive data workloads.

AllreducePerformanceRDMA
0 likes · 10 min read
DiDi PS: High-Performance RDMA-Based Parameter Server for Distributed Deep Learning
High Availability Architecture
High Availability Architecture
Aug 2, 2017 · Artificial Intelligence

A Comparative Study of Distributed Machine Learning Platforms: Design Methods and Evaluation

This article surveys design approaches for distributed machine learning platforms, classifies them into basic dataflow, parameter‑server, and advanced dataflow models, examines examples such as Spark, PMLS, TensorFlow and MXNet, and presents performance evaluations and future research directions.

SparkTensorFlowdistributed machine learning
0 likes · 10 min read
A Comparative Study of Distributed Machine Learning Platforms: Design Methods and Evaluation