Artificial Intelligence 12 min read

How ByteDance Scales AI Workloads with Ray, KubeRay, and Kueue

This article explains why Ray is popular among AI researchers, how ByteDance uses KubeRay to host Ray applications, and how Kueue manages and schedules RayJob workloads, covering Ray's architecture, KubeRay components, real-world use cases, and job scheduling strategies.

Volcano Engine Developer Services

Dec 21, 2023

How ByteDance Scales AI Workloads with Ray, KubeRay, and Kueue

What is Ray

Ray originated from UC Berkeley's RISElab as a general‑purpose distributed programming framework that helps users quickly parallelize their programs. Ray Core provides low‑level distributed primitives such as remote func and remote class, while the higher‑level Ray AIR offers AI‑specific libraries.

Ray's GitHub repository now has over 27K stars, and its creator founded Anyscale to manage the open‑source community and commercial products. At the Ray Summit 2023, companies like OpenAI, Uber, Amazon, ByteDance, and Ant Financial reported using Ray. Anyscale also offers commercial LLM products built on Ray, emphasizing cost efficiency and ease of use.

Ray’s ecosystem breaks the traditional AI pipeline silos (Spark for data, Torch DDP/MPI for training, deployment services for inference) by allowing data processing, model training, and serving to be expressed within a single framework.

ByteDance KubeRay + Ray Application Practice

KubeRay Introduction

KubeRay is an open‑source Ray deployment integration toolkit led by ByteDance’s engineering team with contributions from AnyScale, Ant Financial, Microsoft, and others. It has become the de‑facto standard for deploying Ray on Kubernetes.

Deploying Ray directly on physical machines requires manual IP/port configuration, complex scaling, and lacks Kubernetes‑native monitoring, alerting, Ingress, HPA/VPA, etc.

RayCluster

RayCluster is a custom resource definition (CRD) that builds and manages a Ray cluster. It provides pod recovery, cluster‑level hot updates, and integrates with the Ray autoscaler for dynamic scaling based on load, reducing cost while maintaining high availability.

RayJob

RayJob is a CRD for submitting and tracking jobs on a companion Ray cluster. It supports batch scheduling, creates or reuses clusters, updates job status, and cleans up clusters after completion. ByteDance added timeout handling and node‑count waiting features.

RayService

RayService deploys Ray Serve applications to a cloud‑native environment, exposing the serve agent via a Service for seamless traffic routing and supporting hot updates through rolling cluster updates.

Ray Hosting at ByteDance

All internal Ray clusters are managed by KubeRay. ByteDance extends the open‑source version to support large‑scale job submission, persistent clusters for debugging, and single‑job RayJob hosting. The platform also provides authentication, history server, notebook integration, and other surrounding capabilities.

ByteDance workloads span graph computing, offline inference, large‑model training, and parallel computation across both offline and online scenarios.

Scenario Cases

Graph Computing

Ray Core is used to refactor ByteDance’s internal graph engine. Each graph operator runs as a Ray Actor and communicates via MPI based on rank. Ray’s distributed capabilities and KubeRay’s orchestration provide end‑to‑end fault tolerance, automatically restarting failed workers and restoring checkpoints from persistent storage.

Large‑Scale Offline Inference

Ray Dataset’s streaming inference is employed for massive offline inference jobs that require high throughput and resource utilization but tolerate higher latency. Compared with Spark, Ray offers more flexible programming, enabling pipeline parallelism and model parallelism, along with actor‑pool scaling and end‑to‑end fault tolerance.

Kueue Managing / Scheduling RayJob

Kueue is a Kubernetes‑native job management and scheduling framework that provides queue‑based scheduling with priority, preemption, and quota support. It natively handles BatchJob, RayJob, and TFJob types.

Kueue’s architecture includes ResourceFlavor (node abstraction), ClusterQueue (resource pool), LocalQueue (shared pool), and Cohort (cross‑cluster resource sharing). Administrators define these resources, and users submit jobs to a specific LocalQueue. Jobs wait in a pending state until quota and priority conditions are met, optionally triggering cluster autoscaling.

Demo at KubeCon showed RayJob preemption and recovery across queues with different priorities.

AI Kubernetes Distributed Computing Ray KubeRay Kueue

Written by

Volcano Engine Developer Services

The Volcano Engine Developer Community, Volcano Engine's TOD community, connects the platform with developers, offering cutting-edge tech content and diverse events, nurturing a vibrant developer culture, and co-building an open-source ecosystem.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

What is Ray

ByteDance KubeRay + Ray Application Practice

KubeRay Introduction

RayCluster

RayJob

RayService

Ray Hosting at ByteDance

Scenario Cases

Graph Computing

Large‑Scale Offline Inference

Kueue Managing / Scheduling RayJob

Volcano Engine Developer Services

How this landed with the community

Was this worth your time?

0 Comments

ByteDance KubeRay + Ray Application Practice