Big Data 13 min read

Evolution of iQIYI's Event Tracking System and Its Data Processing Pipeline

This article outlines the importance of event tracking for data, describes iQIYI's five‑stage tracking system evolution, analyzes the challenges of the self‑service phase, presents the middle‑platform improvements, explains the migration strategy, and details the downstream data lake, real‑time stream, and data‑warehouse processing workflows.

DataFunSummit
DataFunSummit
DataFunSummit
Evolution of iQIYI's Event Tracking System and Its Data Processing Pipeline

Introduction – Event tracking (埋点) is essential for collecting structured user behavior data, enabling accurate reporting, user profiling, and personalized recommendations. The article introduces five aspects that will be covered, including the significance of tracking, the evolution of iQIYI's tracking system, key delivery moments and events, subsequent data processing, and ongoing improvements.

1. Importance of Tracking – Tracking captures concise, structured events (e.g., employee commute) instead of raw, unstructured data like video, reducing redundancy and simplifying analysis. Two core elements are the timing of the event and the information recorded.

2. Evolution of iQIYI's Tracking System – The system progressed through five phases: First Generation (2010) for web PV and playback, Longyuan 4.0 (2012) for standardized start and playback, Magic Mirror (2016) for high‑flexibility tracking, Pingback 1.0 (2018) for core event standardization, and Pingback 2.0 (2019) for field standardization and fine‑grained coordinates.

3. Self‑Service (Magic Mirror) Stage – Advantages included business‑level granularity, a shared core‑field library, user autonomy, and preliminary field management. Problems emerged: ambiguous delivery timing, field duplication, high downstream consumption cost, and storage/performance issues due to massive Hive tables.

4. Middle‑Platform Improvements – Five optimization areas were introduced: (1) Field library standardization and de‑duplication, (2) Dictionary value consolidation, (3) Page coordinate management via a tree structure, (4) Event specification unifying core events (start, playback, display click) and refining sub‑events (read, cast, pay, download), and (5) QoS metrics and custom events for performance monitoring and business‑specific needs.

5. Migration Strategy from Old to New Tracking – The approach includes dual delivery (old + new) during transition, validating key metrics (PV, UV, duration) on both datasets, analyzing metric differences for business fixes, and finally switching data‑warehouse versions so downstream systems are unaware of the change.

6. Obstacles in Tracking Iteration – High business switch cost, delayed benefit visibility, difficulty in data validation, and long‑running data merges due to version‑based switching.

7. Subsequent Data Processing – Quality assurance involves capture‑tool testing, gray‑release metric comparison, and post‑release monitoring. Data flows into a real‑time Kafka stream for low‑latency scenarios, a data lake for non‑core events, and traditional offline pipelines (ODS → DWD). The data‑warehouse layer performs field renaming, dirty‑value handling, dimension correction, merging of old and new schemas, aggregation table generation, and fallback mechanisms.

8. Ongoing Improvements – Current work focuses on integrating data‑development platforms for upstream‑downstream lineage, production change notifications, downstream consumption detection to reduce resource usage, and field lineage tracking for statistical scope changes.

Overall, the presentation provides a comprehensive view of iQIYI's tracking system lifecycle, technical challenges, and the engineering solutions adopted to ensure reliable, scalable data collection and processing.

Data EngineeringanalyticsBig Datadata pipelineevent trackingiQIYI
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.