Artificial Intelligence 13 min read

iQIYI Machine Learning Platform: Development History, Features, and Practical Experience

This article details the evolution of iQIYI's machine learning platform—from its early Javis‑based deep‑learning system to three major versions that introduced visual workflow, distributed scheduling, auto‑tuning, large‑scale training support, model management, and online prediction—while sharing practical lessons and a real anti‑cheat use case.

DataFunTalk
DataFunTalk
DataFunTalk
iQIYI Machine Learning Platform: Development History, Features, and Practical Experience

Before building a dedicated machine learning platform, iQIYI already operated the advanced deep‑learning platform Javis, which required expert algorithm engineers to submit code to specialized clusters, presenting a high entry barrier for many business teams.

The first version (1.0) of the new platform focused on business‑specific pipelines, using Spark ML for asynchronous distributed scheduling, which noticeably improved algorithm integration efficiency despite a minimal algorithm component in the architecture.

Version 2.0 added a visual front‑end that allowed users to drag‑and‑drop components to build workflows, introduced an independent experiment scheduling service with real‑time task monitoring, and decoupled the task execution engine from any specific algorithm framework, enabling cross‑framework execution. It also integrated message‑log monitoring, a model pool for offline/real‑time predictions, and migrated scheduling to the Babel big‑data platform and Gear task scheduler, while expanding algorithm support to include XGBoost and graph algorithms.

Version 3.0 completed the platform’s functionality by providing online prediction services, automatic hyper‑parameter tuning, a parameter‑server‑based extension for larger model data, and API services to allow external platform integration, thereby covering the entire machine‑learning lifecycle from feature engineering to model training and both offline and online inference.

System experience highlights include an automatic hyper‑parameter tuning framework that operates in multiple iterative rounds, supports Spark, Python, and custom algorithms, and incorporates random, grid, Bayesian, and self‑developed genetic algorithms to achieve higher tuning efficiency.

To address data‑scale challenges, the platform introduced a single‑node Python engine for small datasets and integrated Tencent's Angel parameter server for training models with billions of records, achieving more than a 50% speed improvement over pure Spark ML for large‑scale workloads.

For model management and scheduling, the platform uses custom model files and PMML to enable a unified prediction component that can load models from various frameworks, carry model context for online inference, and simplify deployment across different environments.

The online prediction system offers a local mode (packaged JAR) and a cloud mode (Docker‑based services) with HTTP access via Consul and RPC access via Dubbo, supporting both push and pull deployment strategies to handle model updates.

A practical case is the anti‑cheat business, which leverages the platform to filter tens of millions of logs daily, achieving over 80% efficiency improvement and handling peak online prediction loads of tens of thousands of queries per second.

Big Datamachine learningPlatformonline predictionhyperparameter tuningModel Management
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.