Tagged articles
17 articles
Page 1 of 1
iQIYI Technical Product Team
iQIYI Technical Product Team
May 31, 2024 · Artificial Intelligence

How Opal Turns iQIYI’s ML Workflow into a Unified AI Platform

Opal is iQIYI's end‑to‑end machine‑learning platform that integrates feature production, sample construction, model training, and deployment with big‑data services, addressing duplicated effort, weak data processing, and fragmented pipelines to boost efficiency across recommendation, advertising, and risk‑control scenarios.

AI OperationsBig Data IntegrationDistributed Training
0 likes · 19 min read
How Opal Turns iQIYI’s ML Workflow into a Unified AI Platform
DataFunTalk
DataFunTalk
Oct 20, 2023 · Artificial Intelligence

Building the ATLAS Automated Machine Learning Platform at Du Xiaoman: Architecture, Practices, and Optimizations

This article describes how Du Xiaoman tackled the high cost, instability, and long cycles of AI algorithm deployment by building the ATLAS automated machine learning platform, detailing its four‑stage workflow, component platforms, scaling and efficiency techniques, and practical Q&A for practitioners.

AI deploymentAutoMLData Parallelism
0 likes · 22 min read
Building the ATLAS Automated Machine Learning Platform at Du Xiaoman: Architecture, Practices, and Optimizations
Tencent Advertising Technology
Tencent Advertising Technology
Mar 30, 2023 · Artificial Intelligence

Tencent's Taiji Machine Learning Platform: End-to-End MLOps for Advertising

Tencent’s Taiji machine learning platform, a cloud‑native, distributed parameter‑server system, provides end‑to‑end MLOps for advertising by integrating data ingestion, feature engineering, model training, evaluation, deployment, and monitoring, supporting massive models up to billions of parameters while improving efficiency, scalability, and resource management.

Distributed TrainingMLOpsMachine Learning Platform
0 likes · 18 min read
Tencent's Taiji Machine Learning Platform: End-to-End MLOps for Advertising
Hulu Beijing
Hulu Beijing
Mar 16, 2023 · Artificial Intelligence

Inside Hulu’s Distributed Training Platform: Architecture, Challenges, and Solutions

This article explores Hulu’s five‑year‑old machine‑learning training platform, detailing its three‑layer architecture, the shift from single‑node to distributed training, and the technical solutions—including parameter servers, Ring AllReduce, Kubernetes, Volcano, and Horovod—that enable scalable AI workloads across GPU, CPU, and storage resources.

AI InfrastructureDistributed TrainingHulu
0 likes · 13 min read
Inside Hulu’s Distributed Training Platform: Architecture, Challenges, and Solutions
DataFunTalk
DataFunTalk
Feb 18, 2023 · Artificial Intelligence

Building the ATLAS Automated Machine Learning Platform at Du Xiaoman: Architecture, Optimization, and Practical Insights

This article details Du Xiaoman's development of the ATLAS automated machine learning platform, covering business scenarios, AI algorithm deployment challenges, the end‑to‑end production workflow, platform components such as annotation, data, training and deployment, as well as optimization techniques like AutoML, meta‑learning, NAS, and large‑scale parallelism, concluding with lessons learned and future directions.

AI deploymentAutoMLMachine Learning Platform
0 likes · 20 min read
Building the ATLAS Automated Machine Learning Platform at Du Xiaoman: Architecture, Optimization, and Practical Insights
Tencent Advertising Technology
Tencent Advertising Technology
Feb 17, 2023 · Big Data

Cost Optimization and Mixed‑Resource Deployment in Tencent's Taiji Machine Learning Platform

The article details how Tencent's Taiji machine‑learning platform reduces training costs and improves efficiency for large‑scale advertising models by leveraging cloud‑native mixed‑resource strategies—including online idle, offline elastic, and compute‑resource sharing—while maintaining high service stability through advanced scheduling, fault‑tolerance, and resource‑prediction techniques.

Big DataCloud NativeMachine Learning Platform
0 likes · 16 min read
Cost Optimization and Mixed‑Resource Deployment in Tencent's Taiji Machine Learning Platform
DataFunTalk
DataFunTalk
Feb 14, 2023 · Artificial Intelligence

Cost Optimization and Mixed‑Resource Deployment in Tencent Taiji Machine Learning Platform for Large‑Scale AI Models

The article describes how Tencent's Taiji machine learning platform leverages cloud‑native mixed‑resource strategies—including online idle, tidal, and compute resources—to reduce training costs, improve stability, and support large‑scale AI model training for advertising and other services.

AICloud NativeMachine Learning Platform
0 likes · 17 min read
Cost Optimization and Mixed‑Resource Deployment in Tencent Taiji Machine Learning Platform for Large‑Scale AI Models
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Nov 11, 2022 · Artificial Intelligence

Large-Scale Deep Learning Systems and Their Application at Xiaohongshu (RED)

Xiaohongshu’s in‑house LarC platform powers real‑time, multimodal recommendation, life‑search, and generative‑AI commercial content for its 200 million‑user community by processing billions of daily feedback samples, employing conflict‑free parameter servers, diversified sequence modeling, and large‑scale representation learning to deliver personalized, fresh, and diverse user experiences.

AI InfrastructureMachine Learning PlatformMultimodal AI
0 likes · 13 min read
Large-Scale Deep Learning Systems and Their Application at Xiaohongshu (RED)
vivo Internet Technology
vivo Internet Technology
Oct 9, 2022 · Artificial Intelligence

vivo Machine Learning Platform: Architecture Design and Practice

vivo’s machine‑learning platform, built for its massive app‑store and e‑commerce ecosystem, streamlines data processing, model training, and deployment through quota‑based resource management, a custom ultra‑large‑scale TensorFlow‑vlps framework, OpenAPI‑driven training, and Jupyter‑integrated interactive development, boosting efficiency for billions of samples and features.

Distributed TrainingMLOpsMachine Learning Platform
0 likes · 12 min read
vivo Machine Learning Platform: Architecture Design and Practice
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Aug 29, 2022 · Artificial Intelligence

Building Yanxuan Machine Learning Platform: Architecture and Implementation

Yanxuan built a Kubeflow‑based machine‑learning platform that unifies data preprocessing, feature engineering, model training, validation, and deployment, using Smart‑jobs, Smart‑Infer, Smart‑backend, Airflow pipelines, Jupyter notebooks, and Istio‑enhanced inference services to boost algorithm engineers’ efficiency and integrate with Kubernetes, HDFS, and Hive.

Airflow orchestrationAlgorithm DevelopmentInference Service
0 likes · 14 min read
Building Yanxuan Machine Learning Platform: Architecture and Implementation
Meituan Technology Team
Meituan Technology Team
Mar 4, 2021 · Artificial Intelligence

How Meituan Waimai Scaled Feature Engineering for Billions of Requests

This article details Meituan Waimai's evolution from a simple feature framework to a sophisticated, configurable platform that handles massive feature production, multi‑task scheduling, dynamic protobuf storage, and a model‑feature description language (MFDL) to enable efficient online retrieval, high‑performance computation, and consistent training‑sample generation for its recommendation, advertising, and search services.

MFDLMachine Learning PlatformMeituan
0 likes · 31 min read
How Meituan Waimai Scaled Feature Engineering for Billions of Requests
DataFunSummit
DataFunSummit
Feb 4, 2021 · Artificial Intelligence

Full-Stack Machine Learning Platform: Architecture, Key Factors, and Implementation Details

This article examines the evolution of user data, computing power, and models, and presents the design principles, key architectural factors, and practical implementation techniques for building a full‑stack machine learning platform that supports large‑scale data processing, distributed training, and low‑latency online serving.

Big Data IntegrationMachine Learning Platformdata pipelines
0 likes · 15 min read
Full-Stack Machine Learning Platform: Architecture, Key Factors, and Implementation Details
DataFunTalk
DataFunTalk
Jul 1, 2020 · Artificial Intelligence

Architecture and Implementation of Autohome's Machine Learning Platform

The article presents a comprehensive overview of Autohome's one‑stop machine learning platform, detailing its background, architecture, resource scheduling, data processing, model training (including distributed deep learning), deployment, real‑world applications such as purchase‑intent and recommendation models, and future development directions.

AutoMLDeep LearningDistributed Training
0 likes · 19 min read
Architecture and Implementation of Autohome's Machine Learning Platform
Meituan Technology Team
Meituan Technology Team
Feb 6, 2020 · Artificial Intelligence

Building a One-Stop Machine Learning Platform: Meituan's Turing Platform

Meituan’s Turing platform consolidates the entire delivery‑order workflow—from massive data ingestion and feature generation to model training, evaluation, deployment, real‑time prediction, and AB testing—into a single, end‑to‑end system that evolved from a minimal MVP into a fully platformized solution, addressing speed, accuracy, and engineering‑algorithm decoupling while planning deeper deep‑learning integration.

AB testingDeep LearningMachine Learning Platform
0 likes · 16 min read
Building a One-Stop Machine Learning Platform: Meituan's Turing Platform
Tencent Cloud Developer
Tencent Cloud Developer
May 29, 2018 · Artificial Intelligence

Intelligent Titanium TI-ONE: Tencent Cloud's One-Stop Machine Learning IDE

Intelligent Titanium TI-ONE is a one‑stop ML IDE on Tencent Cloud offering integrated data preparation, drag‑and‑drop algorithm development, automatic hyperparameter tuning, multi‑level collaboration, one‑click model deployment, and support for major frameworks such as TensorFlow, PyTorch, Angel and XGBoost, plus commercial features via GaiaStack.

AI PlatformMachine Learning PlatformModel Deployment
0 likes · 10 min read
Intelligent Titanium TI-ONE: Tencent Cloud's One-Stop Machine Learning IDE
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Dec 29, 2017 · Cloud Native

Inside JD’s ‘Moon Landing’ ML Platform: Cloud‑Native Architecture Secrets

JD’s Moon Landing Machine Learning Platform, built on Docker and Kubernetes, showcases a cloud‑native architecture that integrates AI services, multi‑tenant security, GPU management, big‑data scheduling, and advanced networking and storage solutions for high‑performance inference and training workloads.

Cloud NativeGPU ManagementKubernetes
0 likes · 15 min read
Inside JD’s ‘Moon Landing’ ML Platform: Cloud‑Native Architecture Secrets