iQIYI Technical Product Team
Author

iQIYI Technical Product Team

The technical product team of iQIYI

402
Articles
0
Likes
931
Views
0
Comments
Recent Articles

Latest from iQIYI Technical Product Team

100 recent articles max
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 15, 2023 · Big Data

Apache Spark at iQIYI: Current Status and Optimization

iQIYI now relies on Apache Spark as its main offline engine, processing over 200 000 daily tasks for ETL, data synchronization and analytics, while recent optimizations—dynamic resource allocation, adaptive query execution, compression, rebalance, Z‑order and resource‑governance—have cut compute usage by ~27 %, storage by up to 76 % and improved query speed, completing a large‑scale migration from Hive and paving the way for Spark 3.4 and Iceberg support.

Apache SparkPerformance optimizationSQL Service
0 likes · 21 min read
Apache Spark at iQIYI: Current Status and Optimization
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 25, 2023 · Big Data

Venus Log Platform Architecture Evolution: From ELK to Data Lake

The Venus log platform at iQiyi migrated from an ElasticSearch‑Kibana architecture to an Iceberg‑based data lake with Trino, cutting storage and compute costs by over 70%, boosting stability by 85%, and efficiently supporting billions of daily logs through write‑heavy, low‑query workloads.

ElasticsearchIcebergTrino
0 likes · 22 min read
Venus Log Platform Architecture Evolution: From ELK to Data Lake
iQIYI Technical Product Team
iQIYI Technical Product Team
Aug 11, 2023 · Artificial Intelligence

Debugging Random OOM Issues in PyTorch Distributed Training on A100 Clusters

The iQIYI backend team traced random OOM crashes in PyTorch Distributed Data Parallel on an A100 cluster to a malformed DDP message injected by a security scan, which forced a near‑terabyte allocation; using jemalloc for diagnostics, they mitigated the issue by adjusting scan policies and collaborating with PyTorch to harden the protocol.

Distributed TrainingMemory DebuggingOOM
0 likes · 9 min read
Debugging Random OOM Issues in PyTorch Distributed Training on A100 Clusters
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 28, 2023 · Operations

Distributed System Log Printing Optimization and Performance Evaluation

The study evaluates log4j2 and logback performance, recommends asynchronous logback for high‑concurrency workloads, demonstrates latency reductions in a production service, and introduces a TraceContext‑based flag to share logging state across micro‑services, cutting daily log volume by ~80 % and easing distributed system overhead.

Log4j2Performance TestingTraceability
0 likes · 16 min read
Distributed System Log Printing Optimization and Performance Evaluation
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 21, 2023 · Mobile Development

Optimizing Video Playback Startup Experience on iQIYI Mobile App

iQIYI’s mobile app reduces video startup latency by preloading data, pre‑decoding frames, pre‑creating player instances, streamlining initialization, optimizing DNS and CDN selection, and employing device‑aware decoding strategies, achieving near‑zero launch times and superior user experience while planning further audio‑track and hardware collaborations.

AndroidMobile Streamingnetwork optimization
0 likes · 8 min read
Optimizing Video Playback Startup Experience on iQIYI Mobile App
iQIYI Technical Product Team
iQIYI Technical Product Team
Jul 14, 2023 · Backend Development

Investigation and Optimization of Long GC Pauses Caused by Excessive FinalReference in Spring Cloud Gateway

The team discovered that frequent Logback file rotations created thousands of FileOutputStream objects whose finalize‑based FinalReference instances flooded the G1GC reference‑processing phase, causing 13‑second pauses, and resolved the issue by enabling parallel reference processing, enlarging log rotation size, and adjusting GC initiation thresholds.

FinalReferenceGC pauseJVM
0 likes · 13 min read
Investigation and Optimization of Long GC Pauses Caused by Excessive FinalReference in Spring Cloud Gateway
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 30, 2023 · Big Data

Advertising Data Lake Architecture and Real-time Optimizations

By replacing the costly Lambda architecture with a unified data‑lake built on Iceberg and Flink CDC, the advertising team achieved minute‑level latency, strong consistency, and lower storage expenses, cutting end‑to‑end processing times from hours to a few minutes across budgeting, warehousing, OLAP and ETL workloads.

AdvertisingFlinkIceberg
0 likes · 13 min read
Advertising Data Lake Architecture and Real-time Optimizations
iQIYI Technical Product Team
iQIYI Technical Product Team
Jun 9, 2023 · Big Data

Accelerating iQIYI Big Data Platform: Migrating from Hive to Spark SQL

iQIYI accelerated its big‑data platform by migrating the OLAP layer from Hive to Spark SQL, achieving a 67 % speedup, 50 % CPU reduction and 44 % memory savings, while automating the conversion of tens of thousands of tasks and delivering faster analytics for advertising, BI, membership and user‑growth services.

HivePerformance optimizationSpark SQL
0 likes · 18 min read
Accelerating iQIYI Big Data Platform: Migrating from Hive to Spark SQL
iQIYI Technical Product Team
iQIYI Technical Product Team
May 26, 2023 · Mobile Development

How We Cut Feed Lag in iQIYI Kids App: A Deep Dive into Mobile Performance Optimization

This case study details the performance bottlenecks of the iQIYI Kids feed on low‑end devices and presents a series of engineering solutions—including async card rendering, preloading strategies, image pre‑decoding, and cache optimizations—that reduced scroll hitch time to 1.4 ms, dramatically improving user experience.

ConcurrencyPerformance optimizationfeed
0 likes · 9 min read
How We Cut Feed Lag in iQIYI Kids App: A Deep Dive into Mobile Performance Optimization
iQIYI Technical Product Team
iQIYI Technical Product Team
May 12, 2023 · Operations

Performance Troubleshooting and Optimization of Prometheus Monitoring Queries

The article explains that high metric cardinality in Prometheus causes long query times and timeouts, and demonstrates how using recording rules to pre‑compute aggregates dramatically reduces cardinality and latency, while recommending scrape interval tuning and metric design best practices to keep charts responsive.

PrometheusQuery OptimizationRecording Rules
0 likes · 10 min read
Performance Troubleshooting and Optimization of Prometheus Monitoring Queries