Feature Center Overview in iQIYI's Opal Machine Learning Platform
The Feature Center in iQIYI’s Opal platform centralizes feature creation, storage, and real‑time access through a drag‑and‑drop DAG workflow and DSL‑driven transformations, handling massive QPS and low‑latency demands while enabling fast business iteration, cross‑team reuse, and monitoring for advertising, recommendation, and risk‑control applications.
Opal is iQIYI's one‑stop machine‑learning platform designed to accelerate feature iteration and model training, thereby improving business revenue. The platform covers the entire ML lifecycle, including feature production, sample construction, model exploration, training, and deployment.
The Feature Center is the core component that manages feature production, storage, and access. It enables engineers and analysts to create, share, and reuse features efficiently, addressing challenges such as massive QPS, real‑time requirements, scalability, and rapid business iteration.
Problems Addressed
Handling massive user requests with high QPS.
Meeting real‑time latency requirements for recommendation, advertising, and risk control.
Providing extensible and flexible feature types (basic, statistical series, windowed, cross features).
Supporting fast business iteration through a DSL‑driven, low‑code workflow.
Core Functions
Feature Input : Manage data sources such as text files, Parquet, Hive, Iceberg, etc.
Feature Computation : Drag‑and‑drop DAG of operators to extract and transform raw logs into features.
Feature Storage : Store computed features in various back‑ends, balancing cost and access efficiency.
Feature Transformation : Convert raw features to model features via a flexible DSL.
Architecture
The overall architecture diagram (image) shows the flow from data ingestion, DAG‑based processing, to storage and online serving.
Feature Production
Offline Feature Groups
Offline features are pre‑computed and stored for later use. Users build DAGs that read from sources (Hive, Iceberg, TFRecord, Parquet), apply operators, and write to target storage. The platform provides schema inference, SQL parsing, quality checks, and task re‑run capabilities.
Real‑time Feature Groups
Real‑time features are generated on‑the‑fly from streams such as Kafka, Iceberg, or MySQL. The same DAG model is used, and results are written back to Kafka or other stores. Sliding‑window operators enable low‑latency feature computation.
Feature View
Feature View unifies access to both offline and real‑time features. It materializes features into online caches (Couchbase, Redis, HBase) and provides a unified client SDK for consumption, abstracting away storage details.
DSL Syntax for Feature Transformation
The DSL uses predefined function keywords, feature variables (enclosed in backticks), and literals. Example literals:
[1, 2, 4] ['aaa', 'bbb'] `city` 123 'hello world'A typical transformation expression might look like:
log(`click_count` + 1)Java SDK
Opal provides a Java client that hides underlying storage complexities, allowing developers to fetch features with simple API calls.
Monitoring
Integrated metrics are exported to Grafana dashboards for health‑checking the feature service.
Business Adoption
Advertising, recommendation, and risk‑control teams have integrated the Feature Center, achieving 0.4‑3× faster feature iteration and ~50% reduction in feature‑fetch latency.
Future Plans
Feature sharing to eliminate duplicate computation.
Real‑time feature quality monitoring.
Feature hotness calculation to aid importance assessment.
iQIYI Technical Product Team
The technical product team of iQIYI
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.