Big Data 18 min read

GooseFS: Distributed Caching System for Storage-Compute Separation Architecture

GooseFS, Tencent Cloud’s distributed caching system for storage‑compute separation, links compute frameworks to underlying storage (COS, CHDFS, COSN) and boosts big‑data and AI workloads by 2‑10× through transparent acceleration, robust master‑worker architecture, Raft‑based HA, tiered caching, and metadata optimizations, delivering up to 50% cost savings and 29% faster compute jobs.

Tencent Cloud Developer

Feb 28, 2022

GooseFS: Distributed Caching System for Storage-Compute Separation Architecture

GooseFS is a distributed caching system developed by Tencent Cloud, serving as a critical component in storage-compute separation architecture. It bridges upper-layer computing frameworks and底层 storage systems (such as COS, CHDFS, COSN), providing significant performance improvements for big data and AI scenarios with 2-10x acceleration.

The article covers three main business scenarios: ETL computing (including data warehousing and thematic computation), BI analysis, and AI computing. It explains why storage-compute separation has become the preferred architecture - traditional coupled storage-compute systems face challenges in balancing storage and compute loads, leading to increased operational costs and complexity.

GooseFS architecture includes four key components: (1) Master for metadata management with high availability support via Zookeeper or Raft-based self-governance; (2) Worker for data caching, eviction, loading, and tiered storage across MEM/SSD/HDD; (3) SDK and Proxy supporting HDFS, FUSE, and S3 protocols with transparent acceleration; (4) Data scheduling service for data prefetching and persistence tasks.

The optimization section details three major improvements: transparent acceleration enabling business-unaware cache integration; system stability fixes including Master memory leak resolution (blockLoc data not deleted after file deletion), HA migration from Zookeeper to Raft (reducing journal flush latency from 20ms to 7ms), metadata scale management via RocksDB, and active metadata synchronization; and Master metadata performance optimization implementing LRU caching strategy that improved rename performance by 5x and worker heartbeat performance by 6x.

Results show 50% cost reduction, 29% compute job performance improvement, and GooseFS handling 60% of read bandwidth.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Architecture Storage Compute Separation Tencent Cloud lru_cache distributed-caching Raft consensus GooseFS metadata optimization

Written by

Tencent Cloud Developer

Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.