Cloud Native 15 min read

Case Study of Using JuiceFS for Cold Data Storage at Ctrip: Architecture, Performance Evaluation, and Optimization

This article presents Ctrip's experience migrating over 2 PB of cold data to JuiceFS, detailing the system's architecture, metadata engine selection, extensive performance testing, fault‑tolerance analysis, and operational optimizations that reduced storage and maintenance costs while supporting future petabyte‑scale workloads.

Ctrip Technology

Aug 4, 2022

Case Study of Using JuiceFS for Cold Data Storage at Ctrip: Architecture, Performance Evaluation, and Optimization

Abstract Ctrip manages more than 10 PB of cold data (backups, media, logs) using local disks and GlusterFS, which suffer from slow directory listings, inflexible scaling, and high maintenance costs. By adopting public‑cloud object storage and JuiceFS, they achieved significant cost reductions and improved operational efficiency.

1. JuiceFS Architecture and POC Testing

JuiceFS separates metadata from data blocks, exposing a POSIX interface via FUSE. Data is written to object storage while metadata (file name, size, permissions, timestamps, directory structure) is stored in a metadata engine such as TiKV or the proprietary enterprise engine. This design ensures fast metadata operations (e.g., ls) regardless of object‑storage latency.

Metadata engine candidates were evaluated; TiKV and the enterprise engine were selected for their ability to store terabytes of metadata and scale horizontally. Performance tests using go‑ycsb on a three‑node cluster (each with 2 × 20‑core CPUs, 128 GB RAM, SSD storage, 25 Gbps network) showed:

Write throughput peaked above 30 k TPS as client threads increased.

Read throughput approached 70 k QPS per node.

TiKV demonstrated sub‑10 ms P99 latency, satisfying cold‑data requirements.

Additional POC tests examined the impact of file size, directory depth, and directory size on IOPS and ls latency, confirming that larger files improve throughput and that directory depth or file count has negligible effect on performance.

2. JuiceFS Principles

During a write, JuiceFS buffers data, splits it into 128 KB chunks, assembles 4 MB blocks, and groups blocks into slices (64 MB). When a slice reaches the chunk size, it is flushed to object storage via parallel PUT operations. Small files are flushed as single slices. An optional write‑back mode stores data locally before asynchronous upload, trading durability for speed.

Read operations retrieve the required blocks from the metadata engine, issue parallel GET requests to object storage, and cache 4 MB blocks locally. Prefetching can be tuned (default prefetch=1) to improve sequential read performance, while random reads may benefit from disabling the cache.

3. Fault Handling and Performance Optimizations

TiKV CPU saturation : High kv_scan load from client clean‑trash tasks caused CPU spikes. Mitigation included monitoring metadata‑engine calls, offloading background tasks to dedicated components, and upgrading the client to a version with distributed locks and a no‑bgjob flag.

TiKV data leakage : Missing MVCC GC caused region and store size growth. Resolved by implementing an external GC worker (see example ) and upgrading TiKV to 5.0.6.

CSI volume cleanup : Stale data remained in OSS after PVC deletion because the JuiceFS mount pod was offline. Solution: keep a dedicated mount pod for background cleanup or run cleanup tasks on an alternative node.

High client memory usage : Analysis showed memory was held as Private_Dirty by the JuiceFS process, primarily due to meta‑backup operations. Disabling the default backup‑meta flag and delegating backups to a separate service reduced memory consumption.

Further architectural refinements include separating session/trash handling for large‑scale workloads, consolidating meta‑backup to a single admin client, implementing bandwidth throttling, and deploying multiple metadata clusters for workload isolation.

4. Summary and Outlook

Migrating cold data to JuiceFS on public‑cloud object storage enabled storage‑compute separation for Elasticsearch, eliminated replica‑driven memory pressure, and dramatically improved ls performance. Over 2 PB of data (Elasticsearch and DBA backups) have been moved, with plans to scale to >10 PB, explore ClickHouse integration, and replace HDFS in cloud environments.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance testing Distributed File System JuiceFS TiKV Cold Data Storage

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.