Artificial Intelligence 13 min read

iQIYI Machine Learning Platform: Development History, Features, and Practical Experience

This article details the evolution of iQIYI's machine learning platform—from its early Javis‑based deep‑learning system to three major versions that introduced visual workflow, distributed scheduling, auto‑tuning, large‑scale training support, model management, and online prediction—while sharing practical lessons and a real anti‑cheat use case.

DataFunTalk

Jan 3, 2021

iQIYI Machine Learning Platform: Development History, Features, and Practical Experience

Before building a dedicated machine learning platform, iQIYI already operated the advanced deep‑learning platform Javis, which required expert algorithm engineers to submit code to specialized clusters, presenting a high entry barrier for many business teams.

The first version (1.0) of the new platform focused on business‑specific pipelines, using Spark ML for asynchronous distributed scheduling, which noticeably improved algorithm integration efficiency despite a minimal algorithm component in the architecture.

Version 2.0 added a visual front‑end that allowed users to drag‑and‑drop components to build workflows, introduced an independent experiment scheduling service with real‑time task monitoring, and decoupled the task execution engine from any specific algorithm framework, enabling cross‑framework execution. It also integrated message‑log monitoring, a model pool for offline/real‑time predictions, and migrated scheduling to the Babel big‑data platform and Gear task scheduler, while expanding algorithm support to include XGBoost and graph algorithms.

Version 3.0 completed the platform’s functionality by providing online prediction services, automatic hyper‑parameter tuning, a parameter‑server‑based extension for larger model data, and API services to allow external platform integration, thereby covering the entire machine‑learning lifecycle from feature engineering to model training and both offline and online inference.

System experience highlights include an automatic hyper‑parameter tuning framework that operates in multiple iterative rounds, supports Spark, Python, and custom algorithms, and incorporates random, grid, Bayesian, and self‑developed genetic algorithms to achieve higher tuning efficiency.

To address data‑scale challenges, the platform introduced a single‑node Python engine for small datasets and integrated Tencent's Angel parameter server for training models with billions of records, achieving more than a 50% speed improvement over pure Spark ML for large‑scale workloads.

For model management and scheduling, the platform uses custom model files and PMML to enable a unified prediction component that can load models from various frameworks, carry model context for online inference, and simplify deployment across different environments.

The online prediction system offers a local mode (packaged JAR) and a cloud mode (Docker‑based services) with HTTP access via Consul and RPC access via Dubbo, supporting both push and pull deployment strategies to handle model updates.

A practical case is the anti‑cheat business, which leverages the platform to filter tens of millions of logs daily, achieving over 80% efficiency improvement and handling peak online prediction loads of tens of thousands of queries per second.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data machine learning Platform online prediction hyperparameter tuning Model Management

Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.