Industry Insights 17 min read

How iQIYI Built a Cloud‑Native Risk Control Platform to Stop Credential Stuffing

iQIYI’s security cloud team designed a data‑driven, cloud‑native risk control platform that unifies threat detection, rule management, and security knowledge across membership, video, e‑commerce and payment services, achieving sub‑5 ms latency, 24 billion daily requests, and near‑complete elimination of machine credential‑stuffing attacks.

iQIYI Technical Product Team

Oct 13, 2017

How iQIYI Built a Cloud‑Native Risk Control Platform to Stop Credential Stuffing

Background and Business Risks

iQIYI’s rapid growth has increased business complexity, user data value, and security challenges. Traditional network attacks are now joined by new risks such as credential stuffing, piracy, fraudulent marketing, social spam, and payment fraud.

Typical Risk Scenarios

Membership: credential stuffing, account sharing, bulk registration.

Video: piracy, ad blocking, fake view counts.

Activities: coupon abuse.

Live streaming: fake popularity, malicious content.

E‑commerce: malicious orders, fraud.

Payment: account theft, money laundering, malicious withdrawals.

Other: phishing, brute‑force attacks, SMS bombing.

Problems with Existing Operations

Isolated efforts: each business unit builds point solutions, no shared data, single‑point defenses are easily bypassed.

Ad‑hoc rules: thresholds set by experts without data, causing false positives and poor user experience.

Slow reaction: inability to quickly adapt to new attack patterns; tight coupling with business releases.

Single‑dimensional controls: over‑reliance on IP blacklists, static limits, hard‑to‑maintain lists.

Design Goals for a Cloud‑Native Risk Control System

Joint defense across business lines.

Data‑driven, intelligent threat detection.

Flexible policies for rapid response.

Multi‑dimensional detection and mitigation.

Controlled latency and low coupling for graceful degradation.

Fast implementation and efficient deployment on private‑cloud topologies.

Architecture Overview

The platform consists of three core services:

Magellan: entry layer, data query engine, rule engine, model engine, management console for risk events, simulation, monitoring, dashboards.

Columbus: feature engineering, large‑scale anomaly detection, deep learning, knowledge graph, real‑time and offline features, security portrait generation.

Zhenghe: security knowledge base storing threat intelligence and foundational security data.

Magellan Sub‑Engines

Query Engine: real‑time and batch data retrieval and aggregation.

Rule Engine: matches rules, supports custom execution strategies (exit on hit, full execution, conditional exit) and multiple rule types (scorecards, decision trees, decision tables, simple rules).

Model Engine: processes features from the query engine, runs algorithms, and serves models to the rule engine.

Deployment Model

Services are deployed in multiple IDC sites. The management console runs in a primary IDC and can fail‑over to a secondary IDC. Service engines are co‑located with protected business services to minimize latency.

Data Pipeline

Columbus ingests massive real‑time streams via Apache Flink (millisecond latency) and near‑real‑time / offline data via Apache Spark, Impala/Hive (seconds to hours). It builds security portraits with over 600 tags and 1.9 billion records, covering IP reputation, device fingerprints, phone‑number scores, and more.

Verification Methods

Graphic CAPTCHA.

Slide CAPTCHA based on human motion.

SMS verification (up/down).

Trusted‑device verification.

Security‑Shield app providing OTP, push confirmation, QR‑code scan.

Operational Outcomes

Daily request volume > 24 billion, average latency < 5 ms, zero incidents.

Real‑time blocking of credential‑stuffing attempts exceeded 200 million; daily successful attacks reduced to single‑digit counts.

Comprehensive coverage of membership, video, live, e‑commerce, payment, social, and IT services.

Flexible risk control: heavy monitoring in normal periods, aggressive counter‑measures during attacks while preserving user experience.

Key achievements: pre‑ and post‑attack deep defense, cross‑business joint control, real‑time anomaly detection, near‑100 % suppression of machine credential stuffing.

Key Takeaways

Security must be tightly coupled with business needs.

Cloud‑native services enable rapid iteration and scaling.

Continuous data‑driven operation is essential for effective risk mitigation.

Cross‑team collaboration and shared data dramatically improve detection accuracy.

Prioritize the 20 % of risks that cause 80 % of impact.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

machine learning risk control iQIYI cloud security industry insights data-driven security

Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.