How Fluid + JuiceFSRuntime Powers Scalable Cloud‑Native Quantitative Research
This article explains how Metabit Trading built a cloud‑native quantitative research platform using Fluid and JuiceFSRuntime to achieve elastic compute, high‑throughput data caching, and cost‑effective scaling for AI‑driven trading strategies.
Background
Advances in machine learning, cloud computing, and cloud‑native technologies enable quantitative finance teams to ingest both structured market data and low‑signal unstructured data (research reports, news, social media) for AI‑driven strategy research.
Challenges for Machine‑Learning‑Based Quant Research
Traditional quant pipelines handle only price, volume, and return series. Adding unstructured data introduces noise, bursty workloads, high concurrency, and limited compute resources, requiring elastic data‑caching and fine‑grained access control.
Platform Requirements
Elastic handling of sudden high‑volume tasks.
Elastic data‑cache throughput for hot market data (hundreds of Gbps).
Linear scalability of capacity and throughput.
Data‑affinity scheduling to reuse local caches.
IP protection with isolated data access.
Intermediate‑result caching for feature pipelines.
Support for multiple file systems (OSS, CPFS, NAS, JuiceFS).
Solution Overview
Metabit adopted Fluid (CNCF sandbox) together with JuiceFSRuntime . Fluid abstracts data usage as a Dataset instead of a generic Persistent Volume Claim, allowing per‑access‑pattern features (read‑only, read‑write, small‑file) and lifecycle management. JuiceFSRuntime provides a distributed POSIX‑compatible cache that integrates with Fluid’s autoscaling, portability, observability, and affinity scheduling.
Architecture
Fluid creates a Dataset that describes the data access pattern. JuiceFSRuntime implements the caching layer, exposing a POSIX interface while using JuiceFS as the cloud‑storage backend. The whole stack runs on Kubernetes, leveraging native scheduling and resource management.
Key Features of the Dataset Abstraction
Performance tuning per access pattern : read‑only for model training, read‑write for feature generation.
Data isolation : Kubernetes namespaces map each Dataset to a distinct JuiceFS sub‑directory.
Cache sharing : Public datasets are cached once and reused across teams.
Runtime Configuration and Elastic Scaling
Resources for JuiceFSRuntime (CPU, memory, network, worker count) are tuned per Dataset. The runtime supports manual, automatic, and scheduled scaling policies.
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: metabit-juice-research
spec:
mounts:
- name: metabit-juice-research
mountPoint: juicefs:///
options:
metacache: ""
cache-group: "research-groups"
encryptOptions:
- name: token
valueFrom:
secretKeyRef:
name: juicefs-secret
key: token
- name: access-key
valueFrom:
secretKeyRef:
name: juicefs-secret
key: access-key
- name: secret-key
valueFrom:
secretKeyRef:
name: juicefs-secret
key: secret-key
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: node.kubernetes.io/instance-type
operator: In
values:
- ecs.g7.8xlarge
- ecs.g7.16xlarge
tolerations:
- key: jfs_transmittion
operator: Exists
effect: NoSchedule
---
apiVersion: data.fluid.io/v1alpha1
kind: JuiceFSRuntime
metadata:
name: metabit-juice-research
spec:
replicas: 5
tieredstore:
levels:
- mediumtype: MEM
path: /dev/shm
quota: 40960
low: "0.1"
worker:
nodeSelector:
nodeType: cacheNode
options:
cache-size: 409600
free-space-ratio: "0.15"Scaling policies are expressed with a CronHorizontalPodAutoscaler:
apiVersion: autoscaling.alibabacloud.com/v1beta1
kind: CronHorizontalPodAutoscaler
metadata:
name: research-weekly
namespace: default
spec:
scaleTargetRef:
apiVersion: data.fluid.io/v1alpha1
kind: JuiceFSRuntime
name: metabit-juice-research
jobs:
- name: "scale-down"
schedule: "0 0 7 ? * 1"
targetSize: 10
- name: "scale-up"
schedule: "0 0 18 ? * 5-6"
targetSize: 20Performance Evaluation
Using 20 ecs.g7.8xlarge nodes (25 Gbps each) as cache workers, latency was measured under varying pod concurrency. With few pods, Fluid showed little benefit; with 100 concurrent pods, Fluid reduced average latency by >40 % compared with a traditional distributed storage setup, yielding faster task completion and lower ECS cost.
Conclusion
Production use of Fluid + JuiceFSRuntime demonstrates that cloud‑native elastic data caching satisfies the high‑throughput, elastic, and data‑affinity requirements of AI‑driven quantitative research. The approach delivers higher performance, cost savings, and a flexible, observable platform that scales with workload demand.
References
Fluid project repository:
https://github.com/fluid-cloudnative/fluidSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
