Big Data 23 min read

Alluxio Integration and Optimization for Multi‑AZ Big Data Analytics at iQIYI

iQIYI integrates Alluxio with its QBFS multi‑AZ unified scheduling system, automatically caching hot tables, applying table‑level policies, page‑level storage and AZ‑aware worker selection, which together cut cross‑zone traffic, halve query latency, achieve up to 20× I/O speedup and a three‑fold overall performance boost.

iQIYI Technical Product Team

Nov 21, 2024

Alluxio Integration and Optimization for Multi‑AZ Big Data Analytics at iQIYI

Alluxio is an open‑source distributed data orchestration system that sits between storage and compute, providing distributed caching and global data access to accelerate cross‑cluster big‑data analytics and AI training scenarios. It offers a unified client API and a global namespace, allowing applications to connect to multiple storage systems through a single interface, thus reducing data access latency and storage compatibility issues.

At iQIYI, Alluxio’s distributed cache is used to reduce cross‑Availability‑Zone (AZ) data transfer among multiple big‑data clusters, saving dedicated network bandwidth. It also addresses query latency growth caused by the separation of storage and compute in OLAP architectures, improving analytical performance.

This article introduces the integration scheme and management strategies of Alluxio caching in these two scenarios, along with encountered challenges and solutions.

Alluxio in iQIYI’s Multi‑AZ Unified Scheduling Architecture

iQIYI’s data is spread across clusters in multiple AZs. A unified scheduling architecture (see the referenced diagram) enables seamless cross‑cluster data access and compute scheduling, significantly lowering storage and compute costs.

Within this architecture, data on different clusters is routed through a self‑developed QBFS (iQIYI Bigdata FileSystem) which provides a unified namespace. Spark, Flink and other compute frameworks interact with QBFS without needing to know the physical location of the data, generating massive cross‑AZ traffic and new challenges such as latency, performance fluctuation, and dedicated line traffic.

Alluxio is integrated with QBFS to build a cross‑AZ QBFS‑Alluxio cache system that automatically loads hot data based on data temperature, reducing cross‑AZ transfer and saving bandwidth.

QBFS‑Alluxio Cache Architecture

QBFS is a virtual file system that abstracts underlying storage details and exposes a unified namespace to compute engines. It supports multiple storage types (HDFS, private‑cloud object storage, public‑cloud object storage) and integrates with Alluxio for cache acceleration, hierarchical storage, and cross‑cluster namespace.

The cache system consists of four key components:

QBFS client/server: Compute engines use the QBFS client to interact with UFS or Alluxio; the server’s mount configuration dynamically decides which storage or cache service to access.

Hotness analysis service: Determines which tables should be cached based on access frequency.

Cache health‑check service: Periodically checks Alluxio cluster availability and informs the QBFS server for graceful degradation.

Cache management service: Provides mount/unmount interfaces, ensures data consistency, and actively cleans expired caches.

Transparent Access for Compute Engines

The integration enables compute engines to access Alluxio transparently through the QBFS client, which wraps the Alluxio client with an HCFS‑compatible interface. An Alluxio‑QBFS plugin adapts Alluxio to the QBFS protocol, ensuring consistent mount semantics and centralizing configuration management on the QBFS server.

The request flow is:

File request: the compute engine obtains a file path from the MetaStore and issues a request via the QBFS client.

Routing decision: QBFSFileSystem selects either the cache proxy or the persistent proxy based on routing rules and cluster status.

Cache hit: if the cache rule matches, the request is redirected to Alluxio, which serves the data.

UFS access: if the cache misses, Alluxio forwards the request to the underlying UFS through the QBFS client.

Cache Mounting and Eviction Strategies

Since ETL and OLAP workloads operate at the table level (e.g., Hive or Iceberg tables), cache control is also set at the table granularity. Hotness analysis aggregates file‑level repeat‑access rates to the table level to decide which tables to cache and which to evict, ensuring high cache hit rates and performance.

The repeat‑access rate is computed from audit logs (openFile and listStatus counts) combined with file size to approximate read traffic.

Based on the analysis, three actions are taken:

Mount tables with cache and set a TTL (time‑to‑live) to limit the range of partitions that can use the cache.

Proactively evict tables whose hotness declines.

Perform partition‑level cleanup of expired cache daily, avoiding the default LRU eviction that blocks requests.

Table‑Level Dynamic Cache Configuration

Different table types receive customized Alluxio client settings:

Hive tables (daily updates): write‑through caching is disabled; data is cached only on read.

Iceberg tables (5‑minute updates): write‑through caching is enabled to pre‑warm data.

Online business tables: hot data is pinned in the cache to prevent eviction.

Cache Degradation Mechanisms

Two degradation strategies are implemented in the QBFS client to mitigate Alluxio failures:

Cluster‑level degradation: When the Alluxio cluster is unavailable (e.g., master failure, worker loss), the client disables cache mounts for the affected cluster and routes all requests directly to UFS.

Single‑request degradation: If a specific request fails on the cache layer while the cluster is otherwise healthy, the read request falls back to the persistent layer; write failures are not degraded and require operator intervention.

A failure‑rate‑based cooling policy limits retry attempts and gradually restores cache usage once the cluster recovers.

Cross‑AZ Deployment and Optimizations

To avoid inconsistent metadata across multiple Alluxio clusters, a single large Alluxio cluster spanning all AZs is deployed. Workers are distributed across AZs and co‑located with compute nodes, leveraging idle disk space.

StrictAzPolicy Worker Selection : Existing worker selection policies (LocalFirst, CapacityBasedDeterministicHashPolicy, etc.) ignore AZ distribution, causing cross‑AZ reads even when a local replica exists. The new StrictAzPolicy forces reads to prefer workers in the same AZ, reducing latency and bandwidth consumption.

PageStorage to Reduce Read Amplification

Alluxio 2.8 introduced Page‑level caching (minimum 1 MB) as an alternative to the default Block‑level cache (64 MB). Page storage loads only the required pages from the underlying file system, dramatically reducing read amplification for small reads such as Parquet footers.

TPC‑H benchmarks with a 100 GB dataset showed a 33 % reduction in read amplification when using 1 MB pages, and overall query latency improvements.

PageStorage Performance Optimizations

Before optimization, each page read instantiated a PagedUfsBlockReader and performed many unnecessary ByteBuffer.allocateDirect allocations. The optimization delays buffer allocation until an actual UFS read is required, reducing memory churn and improving query performance.

After the change, Trino TPCH queries on 5 MB pages showed noticeable latency reductions (see the performance chart).

Real‑World Impact at iQIYI

In a near‑real‑time OLAP scenario, Alluxio workers are co‑deployed with Trino workers, and a soft‑affinity scheduler improves local cache reads. Tests showed that larger files (e.g., 500 MB) achieve roughly twice the read throughput compared to small files (600 KB), though both are already in the millisecond range.

Alluxio cache adoption across advertising data, log analysis, and CDN monitoring workloads yielded up to 20× I/O speedup for intensive queries and an average 3× performance boost.

Advertising Data Query Acceleration

After migrating both real‑time (Kafka → Kudu) and offline (Hive) data to Iceberg on a unified HDFS store, Alluxio caching reduced P99 query latency by 50 % and accelerated the end‑to‑end pipeline by nearly 5×.

Log Analysis Acceleration

Moving logs from Elasticsearch to an Iceberg lake and querying via Trino introduced latency spikes (P99 > 30 s). Introducing Alluxio reduced query latency to under 10 s and stabilized performance.

Future Plans

Upcoming work includes improving hotness analysis accuracy by collecting precise read traffic from QBFS clients and adding caller context to Alluxio for better traceability. Exploration of public‑cloud cache acceleration and AI large‑model scenarios is also underway.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Data Lake Alluxio Cache Optimization Multi‑AZ

Written by

iQIYI Technical Product Team

The technical product team of iQIYI

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.