Deploying and Managing Ray on Alibaba Cloud ACK with KubeRay: Architecture, Code Samples, and Scheduling Strategies
This article explains how to build a flexible machine‑learning infrastructure on Alibaba Cloud ACK using Ray and KubeRay, covering Ray's core components, AI libraries, deployment options on VMs and Kubernetes, code examples for data processing, model serving, and advanced scheduling and quota management techniques.
Machine‑learning infrastructure (ML Infra) is crucial for production workloads such as data processing, model training, and inference, each with different resource requirements; a flexible ML Infra enables these workloads to run efficiently in production.
Alibaba Cloud Container Service for Kubernetes (ACK) provides a managed solution that simplifies the creation of Ray clusters and integrates with Alibaba Cloud scheduling, storage, logging, and monitoring.
01. Ray
Ray, originated from UC Berkeley RISELab, is a distributed computing framework offering flexible APIs for rapid AI application development. Its architecture consists of three layers: Ray Core, Ray AI Lib, and deployment methods.
1. Ray Core
Ray Core provides the fundamental API (Task, Actor, Object) similar to Spark Core or Hadoop MapReduce. Functions or classes can be turned into remote executables using the @ray.remote decorator.
2. Ray AI Lib
The Ray ecosystem includes Ray Data, Ray Train, Ray Tune, and Ray Serve, which wrap Ray Core to enable efficient distributed execution for data processing, training, hyper‑parameter tuning, and model serving.
Ray Data supports operations such as map, filter, and column manipulation on structured and unstructured data, automatically partitioning datasets for parallel execution and handling CSV, Parquet, Pandas, and database sources.
Example of using Ray Data to read a CSV from S3, transform it, and write the result:
import ray
# Load a CSV dataset directly from S3
ds = ray.data.read_csv("s3://anonymous@air-example-data/iris.csv")
ds.show(limit=1)
from typing import Dict
import numpy as np
# Define a transformation to compute a "petal area" attribute
def transform_batch(batch: Dict[str, np.ndarray]) -> Dict[str, np.ndarray]:
vec_a = batch["petal length (cm)"]
vec_b = batch["petal width (cm)"]
batch["petal area (cm^2)"] = vec_a * vec_b
return batch
# Apply the transformation to our dataset
transformed_ds = ds.map_batches(transform_batch)
print(transformed_ds.materialize())
print(transformed_ds.take_batch(batch_size=3))
import os
transformed_ds.write_parquet("/tmp/iris")
print(os.listdir("/tmp/iris"))Ray Serve is a scalable model‑serving library that abstracts away the underlying framework, allowing deployment of PyTorch, TensorFlow, Keras, Scikit‑Learn, or custom Python logic with features such as streaming responses, dynamic batching, and multi‑GPU support.
Simple Ray Serve example that deploys a two‑replica translation model:
# File name: serve_quickstart.py
from starlette.requests import Request
import ray
from ray import serve
from transformers import pipeline
@serve.deployment(num_replicas=2, ray_actor_options={"num_cpus": 0.2, "num_gpus": 0})
class Translator:
def __init__(self):
# Load model
self.model = pipeline("translation_en_to_fr", model="t5-small")
def translate(self, text: str) -> str:
# Run inference
model_output = self.model(text)
# Post‑process output
translation = model_output[0]["translation_text"]
return translation
async def __call__(self, http_request: Request) -> str:
english_text: str = await http_request.json()
return self.translate(english_text)
translator_app = Translator.bind()Ray Train and Ray Tune further reduce the complexity of model training and hyper‑parameter tuning, completing the end‑to‑end ML lifecycle.
3. Ray Deployment
Ray can be deployed on virtual machines (using ray start ) or on Kubernetes via KubeRay, which manages the lifecycle of Ray clusters.
02. KubeRay
KubeRay provides three abstractions for Kubernetes: RayCluster (a persistent Ray cluster), RayJob (a one‑time task), and RayServe (a long‑running service). Example of creating a RayCluster with autoscaling:
cat <KubeRay integrates seamlessly with ACK, offering enhanced security, zero‑maintenance autoscaling, high‑availability across zones, and built‑in observability.
03. Ray on ACK
ACK hosts a managed KubeRay Operator, providing security hardening, automatic VPA‑driven scaling, multi‑zone high availability, and log collection for the operator.
Advanced scheduling on ACK allows customers to combine different compute types (ECS, ACS, etc.) using ResourcePolicy to define fallback and priority rules. Example ResourcePolicy:
apiVersion: scheduling.alibabacloud.com/v1alpha1
kind: ResourcePolicy
metadata:
name: resourcepolicy-example
namespace: default
spec:
selector:
key1: value1
units:
- resource: ecs
- resource: eciQuota and queue support via Kube Queue enables per‑team or per‑project resource guarantees and limits, automatically routing RayJob submissions to appropriate queues based on namespace and priority.
ACK also integrates Prometheus, Alibaba Cloud Log Service (SLS), and a Ray HistoryServer, allowing persistent monitoring and log access for both active and completed Ray clusters.
Alibaba Cloud Infrastructure
For uninterrupted computing services
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.