How iQIYI Built a Cloud‑Native Risk Control Platform to Stop Credential Stuffing
iQIYI’s security cloud team designed a data‑driven, cloud‑native risk control platform that unifies threat detection, rule management, and security knowledge across membership, video, e‑commerce and payment services, achieving sub‑5 ms latency, 24 billion daily requests, and near‑complete elimination of machine credential‑stuffing attacks.
Background and Business Risks
iQIYI’s rapid growth has increased business complexity, user data value, and security challenges. Traditional network attacks are now joined by new risks such as credential stuffing, piracy, fraudulent marketing, social spam, and payment fraud.
Typical Risk Scenarios
Membership: credential stuffing, account sharing, bulk registration.
Video: piracy, ad blocking, fake view counts.
Activities: coupon abuse.
Live streaming: fake popularity, malicious content.
E‑commerce: malicious orders, fraud.
Payment: account theft, money laundering, malicious withdrawals.
Other: phishing, brute‑force attacks, SMS bombing.
Problems with Existing Operations
Isolated efforts: each business unit builds point solutions, no shared data, single‑point defenses are easily bypassed.
Ad‑hoc rules: thresholds set by experts without data, causing false positives and poor user experience.
Slow reaction: inability to quickly adapt to new attack patterns; tight coupling with business releases.
Single‑dimensional controls: over‑reliance on IP blacklists, static limits, hard‑to‑maintain lists.
Design Goals for a Cloud‑Native Risk Control System
Joint defense across business lines.
Data‑driven, intelligent threat detection.
Flexible policies for rapid response.
Multi‑dimensional detection and mitigation.
Controlled latency and low coupling for graceful degradation.
Fast implementation and efficient deployment on private‑cloud topologies.
Architecture Overview
The platform consists of three core services:
Magellan: entry layer, data query engine, rule engine, model engine, management console for risk events, simulation, monitoring, dashboards.
Columbus: feature engineering, large‑scale anomaly detection, deep learning, knowledge graph, real‑time and offline features, security portrait generation.
Zhenghe: security knowledge base storing threat intelligence and foundational security data.
Magellan Sub‑Engines
Query Engine: real‑time and batch data retrieval and aggregation.
Rule Engine: matches rules, supports custom execution strategies (exit on hit, full execution, conditional exit) and multiple rule types (scorecards, decision trees, decision tables, simple rules).
Model Engine: processes features from the query engine, runs algorithms, and serves models to the rule engine.
Deployment Model
Services are deployed in multiple IDC sites. The management console runs in a primary IDC and can fail‑over to a secondary IDC. Service engines are co‑located with protected business services to minimize latency.
Data Pipeline
Columbus ingests massive real‑time streams via Apache Flink (millisecond latency) and near‑real‑time / offline data via Apache Spark, Impala/Hive (seconds to hours). It builds security portraits with over 600 tags and 1.9 billion records, covering IP reputation, device fingerprints, phone‑number scores, and more.
Verification Methods
Graphic CAPTCHA.
Slide CAPTCHA based on human motion.
SMS verification (up/down).
Trusted‑device verification.
Security‑Shield app providing OTP, push confirmation, QR‑code scan.
Operational Outcomes
Daily request volume > 24 billion, average latency < 5 ms, zero incidents.
Real‑time blocking of credential‑stuffing attempts exceeded 200 million; daily successful attacks reduced to single‑digit counts.
Comprehensive coverage of membership, video, live, e‑commerce, payment, social, and IT services.
Flexible risk control: heavy monitoring in normal periods, aggressive counter‑measures during attacks while preserving user experience.
Key achievements: pre‑ and post‑attack deep defense, cross‑business joint control, real‑time anomaly detection, near‑100 % suppression of machine credential stuffing.
Key Takeaways
Security must be tightly coupled with business needs.
Cloud‑native services enable rapid iteration and scaling.
Continuous data‑driven operation is essential for effective risk mitigation.
Cross‑team collaboration and shared data dramatically improve detection accuracy.
Prioritize the 20 % of risks that cause 80 % of impact.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
