iQIYI Magic Mirror: Evolution of a Big Data Analysis Platform
The article details how iQIYI's Magic Mirror platform evolved from a simple single‑table reporting tool to a multi‑engine, self‑service big data analysis system that improves data access speed, reduces operational costs, and supports comprehensive business analytics across the company.
As the internet industry rapidly expanded, fixed reporting could no longer meet diverse business data needs, making data engineers a bottleneck; iQIYI responded by creating the Magic Mirror platform to empower users with self‑service data analysis capabilities.
The platform progressed through three stages: Magic Mirror 1.0 (2015) offered pingback management and basic Hive‑based single‑table calculations; Magic Mirror 2.0 (2019) introduced data‑warehouse table registration, template‑driven basic, association, and retention analyses, custom SQL, and a distributed Gear workflow to replace single‑point execution; Magic Mirror 3.0 (2022) added the Pilot engine with multi‑engine support (Spark SQL, Hive, Trino, Impala), unified storage (Hive, Iceberg, local), and layered data architecture (ODS, DWD, MID, DIM, AL).
Current architecture separates storage, data, service, and engine layers: storage supports Hive, Iceberg and local files; the data layer unifies a warehouse and data marts with multiple layers; the service layer provides template and custom‑SQL services, subscription, monitoring, and logging; the Pilot engine handles syntax parsing, query interception, intelligent routing, and multi‑engine execution, cutting average job time from ~20 minutes to ~6 minutes.
Functional enhancements include self‑service basic computation that abstracts multi‑table analysis, a unified metric system, advanced dimension handling, and visual dashboards (trend, pie, bar charts) that replace plain tables, enabling richer analysis boards.
Business impact is significant: the platform now serves all iQIYI business lines, increases daily active users, reduces data retrieval latency from days to minutes, saves costs by decommissioning hundreds of servers, and improves data security by monitoring and intercepting unsafe SQL operations.
Future plans focus on expanding query‑engine support and adding more intelligent, automated analysis templates to further reduce manual configuration effort.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.