Artificial Intelligence 17 min read

Deploying and Managing Ray on Alibaba Cloud ACK with KubeRay: Architecture, Code Samples, and Scheduling Strategies

This article explains how to build a flexible machine‑learning infrastructure on Alibaba Cloud ACK using Ray and KubeRay, covering Ray's core components, AI libraries, deployment options on VMs and Kubernetes, code examples for data processing, model serving, and advanced scheduling and quota management techniques.

Alibaba Cloud Infrastructure

Jun 3, 2025

Deploying and Managing Ray on Alibaba Cloud ACK with KubeRay: Architecture, Code Samples, and Scheduling Strategies

Machine‑learning infrastructure (ML Infra) is crucial for production workloads such as data processing, model training, and inference, each with different resource requirements; a flexible ML Infra enables these workloads to run efficiently in production.

Alibaba Cloud Container Service for Kubernetes (ACK) provides a managed solution that simplifies the creation of Ray clusters and integrates with Alibaba Cloud scheduling, storage, logging, and monitoring.

01. Ray

Ray, originated from UC Berkeley RISELab, is a distributed computing framework offering flexible APIs for rapid AI application development. Its architecture consists of three layers: Ray Core, Ray AI Lib, and deployment methods.

1. Ray Core

Ray Core provides the fundamental API (Task, Actor, Object) similar to Spark Core or Hadoop MapReduce. Functions or classes can be turned into remote executables using the @ray.remote decorator.

2. Ray AI Lib

The Ray ecosystem includes Ray Data, Ray Train, Ray Tune, and Ray Serve, which wrap Ray Core to enable efficient distributed execution for data processing, training, hyper‑parameter tuning, and model serving.

Ray Data supports operations such as map, filter, and column manipulation on structured and unstructured data, automatically partitioning datasets for parallel execution and handling CSV, Parquet, Pandas, and database sources.

Example of using Ray Data to read a CSV from S3, transform it, and write the result:

import ray

# Load a CSV dataset directly from S3
ds = ray.data.read_csv("s3://anonymous@air-example-data/iris.csv")

ds.show(limit=1)

from typing import Dict
import numpy as np

# Define a transformation to compute a "petal area" attribute
def transform_batch(batch: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]:
    vec_a = batch["petal length (cm)"]
    vec_b = batch["petal width (cm)"]
    batch["petal area (cm^2)"] = vec_a * vec_b
    return batch

# Apply the transformation to our dataset
transformed_ds = ds.map_batches(transform_batch)

print(transformed_ds.materialize())
print(transformed_ds.take_batch(batch_size=3))

import os
transformed_ds.write_parquet("/tmp/iris")
print(os.listdir("/tmp/iris"))

Ray Serve is a scalable model‑serving library that abstracts away the underlying framework, allowing deployment of PyTorch, TensorFlow, Keras, Scikit‑Learn, or custom Python logic with features such as streaming responses, dynamic batching, and multi‑GPU support.

Simple Ray Serve example that deploys a two‑replica translation model:

# File name: serve_quickstart.py
from starlette.requests import Request
import ray
from ray import serve
from transformers import pipeline

@serve.deployment(num_replicas=2, ray_actor_options={"num_cpus": 0.2, "num_gpus": 0})
class Translator:
    def __init__(self):
        # Load model
        self.model = pipeline("translation_en_to_fr", model="t5-small")

    def translate(self, text: str) -> str:
        # Run inference
        model_output = self.model(text)
        # Post‑process output
        translation = model_output[0]["translation_text"]
        return translation

    async def __call__(self, http_request: Request) -> str:
        english_text: str = await http_request.json()
        return self.translate(english_text)

translator_app = Translator.bind()

Ray Train and Ray Tune further reduce the complexity of model training and hyper‑parameter tuning, completing the end‑to‑end ML lifecycle.

3. Ray Deployment

Ray can be deployed on virtual machines (using ray start) or on Kubernetes via KubeRay, which manages the lifecycle of Ray clusters.

02. KubeRay

KubeRay provides three abstractions for Kubernetes: RayCluster (a persistent Ray cluster), RayJob (a one‑time task), and RayServe (a long‑running service). Example of creating a RayCluster with autoscaling:

cat <<EOF | kubectl apply -f -
apiVersion: ray.io/v1
kind: RayCluster
metadata:
  name: myfirst-ray-cluster
  namespace: default
spec:
  suspend: false
  autoscalerOptions:
    env: []
    envFrom: []
    idleTimeoutSeconds: 60
    imagePullPolicy: Always
    resources:
      limits:
        cpu: 2000m
        memory: 2024Mi
      requests:
        cpu: 2000m
        memory: 2024Mi
    securityContext: {}
    upscalingMode: Default
  enableInTreeAutoscaling: false
  headGroupSpec:
    rayStartParams:
      dashboard-host: 0.0.0.0
      num-cpus: "0"
    serviceType: ClusterIP
    template:
      spec:
        containers:
        - image: rayproject/ray:2.36.1
          imagePullPolicy: Always
          name: ray-head
          resources:
            limits:
              cpu: "4"
              memory: 4G
            requests:
              cpu: "1"
              memory: 1G
  workerGroupSpecs:
  - groupName: work1
    maxReplicas: 1000
    minReplicas: 0
    numOfHosts: 1
    rayStartParams: {}
    replicas: 1
    template:
      spec:
        containers:
        - image: rayproject/ray:2.36.1
          imagePullPolicy: Always
          name: ray-worker
          resources:
            limits:
              cpu: "4"
              memory: 4G
            requests:
              cpu: "4"
              memory: 4G
EOF

KubeRay integrates seamlessly with ACK, offering enhanced security, zero‑maintenance autoscaling, high‑availability across zones, and built‑in observability.

03. Ray on ACK

ACK hosts a managed KubeRay Operator, providing security hardening, automatic VPA‑driven scaling, multi‑zone high availability, and log collection for the operator.

Advanced scheduling on ACK allows customers to combine different compute types (ECS, ACS, etc.) using ResourcePolicy to define fallback and priority rules. Example ResourcePolicy:

apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
metadata:
  name: resourcepolicy-example
  namespace: default
spec:
  selector:
    key1: value1
  units:
  - resource: ecs
  - resource: eci

Quota and queue support via Kube Queue enables per‑team or per‑project resource guarantees and limits, automatically routing RayJob submissions to appropriate queues based on namespace and priority.

ACK also integrates Prometheus, Alibaba Cloud Log Service (SLS), and a Ray HistoryServer, allowing persistent monitoring and log access for both active and completed Ray clusters.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

AI kubernetes Distributed Computing Alibaba Cloud Ray KubeRay

Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.