GooseFS: Distributed Caching System for Storage-Compute Separation Architecture
GooseFS, Tencent Cloud’s distributed caching system for storage‑compute separation, links compute frameworks to underlying storage (COS, CHDFS, COSN) and boosts big‑data and AI workloads by 2‑10× through transparent acceleration, robust master‑worker architecture, Raft‑based HA, tiered caching, and metadata optimizations, delivering up to 50% cost savings and 29% faster compute jobs.
GooseFS is a distributed caching system developed by Tencent Cloud, serving as a critical component in storage-compute separation architecture. It bridges upper-layer computing frameworks and底层 storage systems (such as COS, CHDFS, COSN), providing significant performance improvements for big data and AI scenarios with 2-10x acceleration.
The article covers three main business scenarios: ETL computing (including data warehousing and thematic computation), BI analysis, and AI computing. It explains why storage-compute separation has become the preferred architecture - traditional coupled storage-compute systems face challenges in balancing storage and compute loads, leading to increased operational costs and complexity.
GooseFS architecture includes four key components: (1) Master for metadata management with high availability support via Zookeeper or Raft-based self-governance; (2) Worker for data caching, eviction, loading, and tiered storage across MEM/SSD/HDD; (3) SDK and Proxy supporting HDFS, FUSE, and S3 protocols with transparent acceleration; (4) Data scheduling service for data prefetching and persistence tasks.
The optimization section details three major improvements: transparent acceleration enabling business-unaware cache integration; system stability fixes including Master memory leak resolution (blockLoc data not deleted after file deletion), HA migration from Zookeeper to Raft (reducing journal flush latency from 20ms to 7ms), metadata scale management via RocksDB, and active metadata synchronization; and Master metadata performance optimization implementing LRU caching strategy that improved rename performance by 5x and worker heartbeat performance by 6x.
Results show 50% cost reduction, 29% compute job performance improvement, and GooseFS handling 60% of read bandwidth.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.