How Fluid Accelerates Data‑Intensive Serverless Workloads on Alibaba ASK
This guide explains how Fluid, a Kubernetes‑native data orchestration engine, can be deployed on Alibaba Serverless Kubernetes (ASK) to cache and pre‑warm large datasets from OSS, enabling elastic bandwidth, reducing latency, and cutting costs for data‑intensive serverless applications.
Background and Motivation
Data is essential for modern internet services, but data‑intensive workloads such as AI inference, big‑data analytics, and OLAP require high‑performance access to large datasets. In a serverless environment, pulling a 30 GB AI model from OSS for 100 concurrent pods would take 2 400 seconds and cost roughly ¥1 920 in compute time, highlighting the inefficiency of raw serverless data access.
Introducing Fluid
Fluid is a Kubernetes‑native distributed dataset orchestration and acceleration engine designed to address data‑access latency in serverless scenarios. It provides a cloud‑native solution that enables zero‑to‑zero resource usage (from start to complete release) while improving data access efficiency.
Deploying Fluid on Alibaba ASK
Before running the examples, provision an ASK cluster and configure kubectl with the appropriate Kubeconfig. Install Fluid via the ack-fluid Helm chart in the fluid-system namespace using the Alibaba Cloud Container Service console. kubectl get pod -n fluid-system Typical output shows the dataset-controller and fluid-webhook pods in Running state.
Dataset and Runtime Configuration
Create a secret containing OSS credentials:
kubectl create secret generic oss-access-key \
--from-literal=fs.oss.accessKeyId=<access_key_id> \
--from-literal=fs.oss.accessKeySecret=<access_key_secret>Define a Dataset CR that points to an OSS bucket and a JindoRuntime CR that configures five cache workers, each using 40 GiB of memory:
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: demo-dataset
spec:
mounts:
- mountPoint: oss://fluid-demo
name: demo
path: /
options:
fs.oss.endpoint: oss-cn-beijing-internal.aliyuncs.com
encryptOptions:
- name: fs.oss.accessKeyId
valueFrom:
secretKeyRef:
name: oss-access-key
key: fs.oss.accessKeyId
- name: fs.oss.accessKeySecret
valueFrom:
secretKeyRef:
name: oss-access-key
key: fs.oss.accessKeySecret
---
apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
name: demo-dataset
spec:
replicas: 5
podMetadata:
annotations:
k8s.aliyun.com/eci-use-specs: ecs.d1ne.6xlarge
k8s.aliyun.com/eci-image-cache: "true"
tieredstore:
levels:
- mediumtype: MEM
volumeType: emptyDir
path: /dev/shm
quota: 40Gi
high: "0.99"
low: "0.99"Apply the resources:
kubectl create -f dataset.yamlCache Pre‑warming with DataLoad
Create a DataLoad CR to warm the entire dataset and replicate the cache five times:
apiVersion: data.fluid.io/v1alpha1
kind: DataLoad
metadata:
name: demo-dataset-warmup
spec:
dataset:
name: demo-dataset
namespace: default
loadMetadata: true
target:
- path: /
replicas: 5 kubectl create -f dataload.yamlMonitor until the DataLoad reaches Complete phase (≈2 min 20 s). After completion, the dataset shows 100 % cache utilization.
kubectl get dataset demo-datasetRunning a Parallel Data‑Access Job
Launch 100 pods via an Argo Workflow that compute the MD5 checksum of the 30 GB file stored in the cached dataset:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: parallelism-fluid-
spec:
entrypoint: parallelism-fluid
parallelism: 100
podSpecPatch: '{"terminationGracePeriodSeconds": 0}'
podMetadata:
labels:
alibabacloud.com/fluid-sidecar-target: eci
annotations:
k8s.aliyun.com/eci-image-cache: "true"
k8s.aliyun.com/eci-use-specs: ecs.g6e.4xlarge
templates:
- name: parallelism-fluid
steps:
- - name: domd5sum
template: md5sum
withSequence:
start: "1"
end: "100"
- name: md5sum
container:
image: alpine:latest
command: ["/bin/sh", "-c", "md5sum /data/largefile-30G"]
volumeMounts:
- name: data-vol
mountPath: /data
volumes:
- name: data-vol
persistentVolumeClaim:
claimName: demo-dataset argo submit workflow.yamlAll pods complete in ~5 min 58 s, confirming the performance gain.
Resource Cleanup
When the cache is no longer needed, delete the dataset to reclaim cache pods: kubectl delete dataset demo-dataset Scale down control‑plane deployments to zero replicas, and later scale them back up when needed:
kubectl get deployments.apps -n fluid-system | awk 'NR>1 {print $1}' | xargs kubectl scale deployments -n fluid-system --replicas=0
kubectl scale -n fluid-system deployment dataset-controller --replicas=1
kubectl scale -n fluid-system deployment fluid-webhook --replicas=1Performance and Cost Results
Benchmarks show that Fluid’s data offloading provides significantly higher effective bandwidth than direct OSS access (10 Gbps limit) and reduces compute cost to roughly one‑sixth to one‑eighth of the baseline. Increasing cache worker nodes further improves both bandwidth and cost efficiency.
Figure 1 compares effective data‑access bandwidth between OSS and Fluid.
*Effective data‑access bandwidth = (number of serverless pods × data per pod) / total execution time
Figure 2 shows cost reduction when using Fluid versus direct OSS access.
Figure 3 illustrates the shorter task duration achieved with Fluid.
Conclusion
The step‑by‑step example demonstrates that Fluid simplifies data access in ASK, provides elastic bandwidth, and dramatically lowers costs for large‑scale data‑intensive serverless workloads.
References
How to create an ASK cluster: https://help.aliyun.com/document_detail/86377.html
Alibaba Cloud AI Suite details: https://help.aliyun.com/document_detail/201994.html
Fluid project GitHub: https://github.com/fluid-cloudnative/fluid
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
