Big Data 9 min read

Using Alluxio to Mitigate HDFS Maintenance Impact on Real-Time Jobs in Ctrip's Big Data Platform

The article explains how Ctrip's big‑data platform introduced Alluxio to isolate real‑time Spark Streaming jobs from HDFS NameNode maintenance, reduce NameNode pressure, improve Spark SQL performance, and provide a unified storage layer across multiple HDFS clusters.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Using Alluxio to Mitigate HDFS Maintenance Impact on Real-Time Jobs in Ctrip's Big Data Platform

Ctrip, a leading Chinese travel service provider, operates a massive data platform handling over 50 PB of data and 400 TB of daily increments, with more than 300 000 jobs per day. Real‑time Spark Streaming jobs depend on HDFS, so HDFS NameNode maintenance often caused job failures and heavy metadata pressure.

To address these challenges, Ctrip built an isolated Hadoop cluster for streaming workloads and later introduced Alluxio (version 1.4) as a memory‑level distributed file system that can mount multiple underlying storage systems, including two separate HDFS clusters (HDFS‑1 and HDFS‑2).

Alluxio provides a unified API, allowing Spark Streaming to write directly to Alluxio while Alluxio transparently forwards data to the appropriate HDFS cluster. The mounting command is:

alluxiofs mount /path/on/alluxio hdfs://namenode:port/path/on/hdfs

Alluxio supports three write strategies—MUST_CACHE, CACHE_THROUGH, and THROUGH—enabling users to choose between writing only to Alluxio, synchronously to HDFS, or solely to HDFS based on data criticality.

Additional features such as TTL (Time‑To‑Live) with Free and Delete actions help control Alluxio memory usage; expired data can be automatically removed without affecting the underlying HDFS files. The team also contributed back to the Alluxio community by adding folder‑level TTL support, Free‑action TTL deletion, and consistency‑check utilities.

Performance tests showed that loading hot data from Alluxio into Spark SQL improved execution speed by roughly 30%, while the overall architecture reduced NameNode load and eliminated streaming job failures during HDFS maintenance windows.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataStorage OptimizationData LakeHDFSSpark StreamingAlluxio
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.