Inside iQIYI’s Massive Hadoop Platform: Architecture, Ops, and the Gear Workflow Engine
iQIYI’s Hadoop platform, built since 2010, now spans over a thousand nodes and 60 PB storage, detailing its architectural evolution, operational management practices, encountered challenges, and the custom Gear workflow system that streamlines job scheduling, dependencies, and alerts for large‑scale data processing.
Overview
Today’s presentation covers five topics: the evolution of iQIYI’s Hadoop platform architecture, the Hadoop operations management system, difficulties encountered and their solutions, the internally developed Gear workflow management system, and future challenges.
1. Hadoop Platform Architecture Evolution
The first Hadoop cluster was built in August 2010, three months after iQIYI was founded, and was initially managed by the business team responsible for log collection and ETL. In June 2013 the cluster was handed over to the infrastructure department and deployed across multiple data centers.
In 2014 we introduced HA, Kerberos, YARN, and Spark. By May 2015 the platform entered the Hadoop 2.0 era, upgrading all Hadoop 1.0 clusters and becoming an early adopter of Spark on YARN in production.
Today the cluster exceeds a thousand nodes, stores roughly 60 PB of data with a daily growth of about 200 TB. It processes around 150 000 compute tasks per day, amounting to 40 million MapReduce tasks, serving dozens of business lines such as search, advertising, recommendation, user‑behavior analysis, log analysis, and reporting.
Architecture diagram highlights:
Green area – storage layer (Venus log collection via Flume into HDFS, plus HBase).
Red area – Hadoop 2.0 YARN layer providing offline and real‑time compute (Spark Streaming, Storm) and SQL interfaces (Hive, Spark SQL).
Blue area – internal management systems: Hadoop work platform and Gear workflow system.
Purple area – business services (video recommendation, search, data probe, precise advertising, membership).
2. Hadoop Operations Management System
Operations focus on two main aspects: proper server‑side preparation (cluster planning) and standardized usage by business teams.
2.1 Cluster Planning and Standardized Usage
Planning considers three factors:
Early procurement – gathering resource forecasts from users six months to a year in advance to align budgeting and ordering cycles.
Retirement of old machines – managing hardware lifecycle to avoid unexpected failures from out‑of‑warranty servers.
Data‑center considerations – rack placement, power, and network topology.
Standardized usage includes HDFS quotas per department (Name quota for file count, Space quota for storage), ACL‑based data permissions, and compute resource management via the Fair Scheduler.
2.2 Hadoop Work Platform
The platform stores cluster, server, configuration, and user information in a CMDB database and exposes APIs to the management modules, eliminating manual CLI operations. Ansible scripts drive automated tasks.
Data management provides a UI for browsing Hive tables and shared UDF libraries, reducing duplication. A QoS module offers real‑time service‑quality monitoring, allowing both operators and users to view cluster health and anomalies instantly.
3. Difficulties and Solutions
Pseudo high‑availability issue : Switch failures created single points of failure despite HDFS rack awareness. Solution – treat switch IDs as rack identifiers, expanding the logical rack range and tolerating switch bugs.
Linux kernel bugs : High‑load Cgroup bugs caused crashes. Applied YARN‑2809 patch and upgraded to a newer kernel (CentOS 7.2) to eliminate the problem.
JobTracker performance bottleneck : Scaling beyond a few hundred nodes caused excessive scheduling latency. Modified FairScheduler source to reduce scheduling time to under 5 ms, supporting up to 600 nodes.
HDFS balancer slowdown on heterogeneous storage : Optimized lock scope and balancing algorithm to handle large disk size disparities.
YARN ResourceManager bugs : Contributed ~20 patches to the Hadoop community, many adopted by CDH releases.
4. Gear Workflow Management System
Gear addresses the limitations of cron by providing job management, scheduling, dependency handling, alert subscription, and retry mechanisms. It builds on a customized Oozie engine and uses YAML configuration to replace verbose XML.
Access methods:
Web UI for job submission and monitoring.
GitLab + CI pipelines that push workflow definitions directly to Gear.
Java SDK for programmatic workflow creation and submission.
Features include multi‑machine task execution with load balancing, alert channels (email, SMS, internal chat), custom HTTP callbacks on failures, and support for multiple task machines with per‑machine concurrency limits.
5. Future Challenges
Key challenges ahead are reducing storage costs (cold storage, erasure coding in Hadoop 3.0) and delivering more real‑time analytics while improving automation to achieve near‑zero‑touch operations (auto‑recovery, auto‑ticketing).
Data connectivity goals include standardizing data formats, establishing lineage via Gear, and eliminating unnecessary redundancy.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
