How JuiceFS Cut HDFS Load by 26% and Boost Presto Query Speed 13%
This case study details how integrating JuiceFS with Presto reduced HDFS cluster load by about 26%, achieved over 90% cache hit rate for ad‑hoc queries, and lowered average query latency by roughly 13%, while simplifying operations and improving system stability.
Background
QuTouTiao's big‑data platform runs a near‑thousand‑node HDFS cluster that stores hot data from the past few months, ingesting hundreds of terabytes daily. Daily ETL and ad‑hoc tasks heavily rely on this cluster, causing sustained high load, checkpoint failures in Flink, and executor loss in Spark.
Solution Design
Ad‑hoc queries are executed via the Presto engine. JuiceFS's Hadoop SDK can be seamlessly integrated into Presto without code changes, automatically analyzing each query and copying frequently accessed data from HDFS to JuiceFS. Subsequent queries read from the JuiceFS cache, eliminating HDFS requests and reducing cluster pressure.
Because the Presto cluster runs on Kubernetes with elastic scaling, cached data must be persisted. Using a separate HDFS or other cache would be costly; object storage (OSS) is the optimal choice.
JuiceFS Metadata Service
The metadata service manages file names, directory structures, sizes, and timestamps. It runs as a distributed cluster using the Raft consensus protocol to ensure strong consistency and high availability.
JuiceFS Hadoop SDK
The SDK is a client library that can be integrated into any Hadoop ecosystem component, here into Presto workers. It can replace HDFS as the underlying storage or act as a cache. In this solution it works as a cache, transparently copying data from HDFS to JuiceFS without modifying the Hive Metastore. Cached data is served directly to ad‑hoc queries, and the SDK keeps HDFS and JuiceFS data consistent by comparing file modification times (mtime).
Cache Cleanup
To prevent cache over‑consumption, JuiceFS can purge files based on access time (atime). Since many file systems, including HDFS, update atime only periodically (default once per hour via dfs.namenode.accesstime.precision), the cleanup respects recent accesses and combines atime, mtime, and file size to decide what to evict.
Test Plan
The overall effect—stability, performance, and HDFS load—was evaluated through multiple testing phases, each collecting and validating different metrics and allowing cross‑phase comparisons.
Test Results
1 HDFS Cluster Load
Two phases were run with JuiceFS enabled and disabled. In the enabled phase, ten randomly selected HDFS DataNodes showed an average daily disk read I/O of about 3.5 TB, versus 4.8 TB when JuiceFS was disabled, indicating roughly a 26 % load reduction.
Read I/O from JuiceFS represents the amount of HDFS load avoided, while write I/O reflects data copied from HDFS to JuiceFS. Read I/O is roughly ten times larger than write I/O, meaning over 90 % of ad‑hoc queries hit the cache and do not touch HDFS.
2 Average Query Latency
When 50 % of query traffic was directed to the JuiceFS‑enabled cluster, the average query latency dropped by about 13 % compared to the baseline.
Test Summary
Without changing business configurations, the JuiceFS solution transparently reduced HDFS load, with more than 90 % of Presto queries bypassing HDFS and average query latency improving by 13 %, surpassing the original performance targets and stabilizing previously flaky big‑data components.
Future Outlook
Further increase JuiceFS cache hit rate to lower HDFS load.
Expand local cache disk space on Presto workers to improve hit ratio and mitigate tail latency.
Integrate Spark clusters with JuiceFS to cover more ad‑hoc scenarios.
Migrate HDFS fully to JuiceFS for storage‑compute separation, reducing operational costs and improving resource utilization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
