Big Data 14 min read

RiskFactor: An Integrated Real‑Time and Offline Feature Platform for Risk Control

RiskFactor unifies iQIYI’s legacy real‑time and offline feature platforms onto Opal’s DAG‑plus‑SQL engine, accelerating feature production fifteen‑fold, cutting latency from hours to minutes, streamlining development, lowering costs, and delivering more reliable, versioned risk‑control capabilities against sophisticated online threats.

iQIYI Technical Product Team
iQIYI Technical Product Team
iQIYI Technical Product Team
RiskFactor: An Integrated Real‑Time and Offline Feature Platform for Risk Control

Internet risk control is a highly adversarial domain where malicious actors constantly try to bypass defenses. Feature data is the core of the risk‑control system, and the speed of feature production directly determines the effectiveness of the countermeasures.

Traditional risk‑control features are either list‑based (e.g., black‑listed virtual phone numbers) or velocity‑based (e.g., login count per device per hour). These features are consumed by rules and models to decide whether a request should be blocked or allowed.

Since 2018, iQIYI’s risk‑control team built two separate platforms: an offline feature platform (Qishu) and a real‑time feature platform (Qiliuhai) based on Spark and Flink. Over time, operational bottlenecks and frequent latency issues caused several black‑industry leakage incidents.

In Q2 2023 the team launched a new generation platform, RiskFactor, which unifies the two legacy systems on top of the Opal ML platform’s feature center. The new platform accelerates feature production by 15×, giving the team a decisive advantage against attackers and improving loss‑prevention and revenue‑growth.

Pain points of the old platforms

High latency of real‑time feature calculation (up to 6 hours) caused by manual resource allocation and IO blocking.

High failure risk of offline feature jobs due to cross‑platform SDK integration and lack of real‑time task status awareness.

Redundant feature configuration: complex features required separate real‑time and offline setups, leading to low operational efficiency and duplicated storage/computation.

New platform design

Simplified feature configuration using a DAG + SQL model. DAG provides flexible pipeline orchestration, while SQL offers a unified development language for both streaming and batch.

Unified feature production and management by integrating with Opal’s feature center, which supplies batch‑stream scheduling and a query SDK.

Feature production speedup through asynchronous link splitting, runtime task merging, and long‑window topology optimization.

High availability via versioned feature releases and centralized deployment control.

The platform’s architecture (illustrated in the original diagram) consolidates real‑time and offline feature expression, supports versioned deployment, and offers a one‑stop operation interface.

One‑stop development and operation

The DAG + SQL approach abstracts data source, computation, and sink nodes. Offline tasks connect to Hive/Iceberg, while real‑time tasks connect to Kafka. Both use SQL nodes for computation; real‑time adds a specialized Cumulate node for windowed aggregation.

Deployment is unified: feature versions are isolated in storage, and a version‑aware SDK handles overwriting and cache cleaning. Gray‑release mechanisms allow multiple versions to coexist during trial, with dynamic traffic‑shaping and fast rollback.

Optimization outcomes

Development efficiency: the number of feature platforms reduced from four to one; average feature‑launch time dropped from 5 hours to a maximum of 1 hour.

Cost reduction: average development time fell from 3 hours to ~30 minutes thanks to DAG abstraction and unified SQL.

Task management improvement: clear mapping between features and tasks, seamless cluster upgrades, and unified packaging.

Resource utilization: CPU usage stabilized around 60 % after task merging and dynamic scaling.

Feature latency: real‑time feature cache activation reduced from up to 6 hours to a maximum of 4 minutes.

Future plans

Graph‑based risk control: integrate graph engines to generate graph‑related features.

Composite features: support runtime scripts that depend on other features or external data sources.

The presentation concludes with acknowledgments to the Opal ML platform for providing the underlying feature production and query capabilities.

Big Datafeature engineeringReal-time StreamingDAGrisk controlplatform optimization
iQIYI Technical Product Team
Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.