Cloud Native 21 min read

Best Practices for Data Acceleration, Stability, and Consistency with Alibaba Cloud ACK Fluid

This guide details how to use Alibaba Cloud ACK Fluid to accelerate data access, improve system stability, and ensure cache consistency across AI, big‑data, and analytics workloads by selecting appropriate ECS instances, cache media, scheduling affinity, and runtime configurations.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Best Practices for Data Acceleration, Stability, and Consistency with Alibaba Cloud ACK Fluid

In the era of large models, rapid advances in AIGC and LLM technologies create significant data‑processing challenges, especially for training, inference, and big‑data analysis. Simply adding a cache layer does not guarantee performance gains; careful configuration is required.

Performance optimization best practices include selecting suitable ECS instance types and cache media (memory, local HDD/SSD) for Fluid’s distributed cache, calculating cache capacity and bandwidth with formulas, and configuring tiered storage levels. Example YAML for a memory‑based tiered store:

spec:
tieredstore:
levels:
- mediumtype: MEM
volumeType: emptyDir
path: /dev/shm
quota: 30Gi # per Worker cache capacity
high: "0.95"
low: "0.7"

For SSD‑based storage, adjust mediumtype to SSD and set volumeType to hostPath with appropriate paths and quotas.

spec:
tieredstore:
levels:
- mediumtype: SSD
volumeType: hostPath
path: /mnt/disk1
quota: 100Gi
high: "0.95"
low: "0.7"

When multiple local disks are used, list them in path (e.g., /mnt/disk1,/mnt/disk2 ) and the quota is split across the disks.

Scheduling affinity ensures cache Workers and application Pods are placed in the same availability zone to reduce cross‑zone latency. Example Dataset affinity configuration:

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: demo-dataset
spec:
nodeAffinity:
required:
nodeSelectorTerms:
- matchExpressions:
- key: topology.kubernetes.io/zone
operator: In
values:
-
# e.g. cn-beijing-i

Stability best practices recommend persisting cache master metadata on ESSD volumes, configuring sufficient memory limits for FUSE Pods, and enabling the FUSE self‑healing feature so that applications do not need to restart when the FUSE process crashes.

Example JindoRuntime master persistence:

apiVersion: data.fluid.io/v1alpha1
kind: JindoRuntime
metadata:
name: sd-dataset
spec:
volumes:
- name: meta-vol
persistentVolumeClaim:
claimName: demo-jindo-master-meta
master:
resources:
requests:
memory: 4Gi
limits:
memory: 8Gi
volumeMounts:
- name: meta-vol
mountPath: /root/jindofs-meta
properties:
namespace.meta-dir: "/root/jindofs-meta"

FUSE resource configuration (recommended high memory limit):

spec:
fuse:
resources:
requests:
memory: 8Gi
# limits:
#   memory:

Cache read/write consistency strategies depend on workload patterns. For read‑only datasets, the default Fluid configuration suffices. For read‑write scenarios, separate Datasets can be created for read and write paths, or access modes can be set to ReadWriteMany . Example read‑write Dataset:

apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
name: model-ckpt
spec:
accessModes: ["ReadWriteMany"]

When using JindoRuntime, fine‑tune FUSE attribute timeouts to balance consistency and performance:

spec:
fuse:
args:
- -oauto_cache
- -oattr_timeout=30
- -oentry_timeout=30
- -onegative_timeout=30
properties:
fs.jindofsx.meta.cache.enable: "false"

Overall, by selecting appropriate ECS specs, cache media, affinity rules, persistence settings, and runtime parameters, users can achieve optimal data acceleration, high availability, and suitable consistency guarantees for AI training, inference, and big‑data analytics workloads on Alibaba Cloud ACK Fluid.

Cloud NativePerformance Optimizationbig dataKubernetesData CachingACK Fluid
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.