Big Data 18 min read

The Evolution of iQIYI's Big Data Analytics Platform

This article chronicles iQIYI’s journey from a simple Hive‑based data pipeline to the sophisticated, multi‑engine “Tongtian Tower” platform, detailing the development of the Magic Mirror system, the Gear workflow manager, BabelBD, the Monet visual analytics tool, and the integrated BI ecosystem that now supports billions of daily users.

DataFunTalk
DataFunTalk
DataFunTalk
The Evolution of iQIYI's Big Data Analytics Platform

iQIYI’s data platform serves a massive user base with daily active users approaching 300 million, over 30 billion devices, and more than 300 TB of user behavior logs processed each day, imposing stringent requirements on data operations and development.

1. The Beginning Era

Initially, logs were transferred via RSYNC into Hive, processed by shell‑driven Hive SQL, and results were imported into MySQL for reporting, with Java handling the reporting layer. This manual pipeline caused long data‑delivery cycles and heavy developer workload.

2. The Magic Mirror Era

The Magic Mirror system introduced the Accio Log collector to upload logs from Pingback servers to HDFS, and the Transfiguration framework to parse and split logs for storage. Users could self‑service data extraction without waiting for development schedules. However, rapid business growth led to massive log volumes that overloaded Hadoop clusters, and script‑based development became unsustainable.

3. The Tongtian Tower Era

The Tongtian Tower unified all data, compute resources, and service frameworks across iQIYI. Offline processing relies on Hive and Spark; streaming uses Spark Streaming and Flink; OLAP queries run on Impala and Kylin. Storage includes HDFS, HBase, and Kudu (real‑time), while operational databases are MySQL and MongoDB. A dedicated development platform manages workflows, data lineage, cross‑DC synchronization, and data‑warehouse components such as ingestion management, metric‑dimension management, and model management.

4. Workflow Management and Development Evolution

Workflow orchestration progressed from simple Crontab scripts to a custom Shell framework, then to LinkedIn’s Azkaban (single‑node), followed by the internally built Gear system, and finally BabelBD, which offers a drag‑and‑drop interface that abstracts away configuration complexity, allowing developers to focus on core SQL logic.

5. iQIYI BI Platform

The BI platform evolved from a Java‑Web MVC reporting system to a configurable reporting platform (Longyuan 2.0) and finally to a large‑scale BI system that abstracts report construction, supports self‑service analysis, and enforces business‑line and permission segregation.

6. Data Management and Done Service

To guarantee data availability, a Done‑file mechanism was introduced, later replaced by a dedicated Done service that avoids HDFS small‑file overload and provides reliable dependency checks for downstream jobs.

7. Data Warehouse Evolution

Initially, analytics consumed raw log tables directly, then moved to wide tables for convenience, and finally adopted a layered modeling approach (log, detail, aggregate, application layers) with hot/cold partitioning and HBase/Kylin storage to support high‑performance queries.

8. Magic Mirror and Butcher’s Knife (BabelBD)

Magic Mirror provides a UI for self‑service SQL generation, while Butcher’s Knife offers a full‑featured SQL editor. Both route queries to the appropriate execution engine (Impala, Spark, etc.) and perform smart down‑shifting when the primary engine cannot satisfy the request.

9. Monet Visual Analytics System

Monet enables drag‑and‑drop visual analysis, allowing users to build scenes by selecting dimensions and metrics, generate reports, and export data. It integrates with the BI layer and supports multi‑scene composition, automatically generating queries based on user selections.

10. Overall iQIYI Big Data Analysis System

The ecosystem consists of BI reports, Monet analysis, Magic Mirror & Butcher’s Knife for offline data extraction, and various analysis tools (retention, funnel, path, profiling). All components are built on a micro‑service architecture within the enterprise cloud, ensuring scalability and reliability.

In addition to the technical overview, the article includes author information, recruitment details for big‑data engineering roles, and community resources for further learning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data engineeringBig DataworkflowiQIYIBI
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.