Cloud Native 7 min read

Getting Started with GPU Remote Invocation Using rCUDA

This article introduces GPU remote invocation, explains rCUDA's architecture, walks through installing the server and client, demonstrates running CUDA samples on a GPU‑less node, and shows how to deploy rCUDA on Kubernetes with example DaemonSet and Job manifests.

Infra Learning Club
Infra Learning Club
Infra Learning Club
Getting Started with GPU Remote Invocation Using rCUDA

rCUDA Overview

rCUDA (remote CUDA) implements the CUDA runtime API in a client‑server architecture, enabling a host without a GPU to execute CUDA kernels on a remote node that runs a CUDA 8.0 environment. It supports TCP/IP and InfiniBand communication. The most recent public release (v16.11.04.02) implements all CUDA 8.0 runtime interfaces; the project is no longer actively maintained.

Demo Setup

Repository with the pre‑built rCUDA binaries and CUDA 8.0 libraries:

https://github.com/lengrongfu/study-demo/tree/main/gpu/rcuda

rCUDA Server

export LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64
./rCUDAd -h
# shows usage and version (v16.11.04.02)
./rCUDAd -i   # start daemon in interactive mode

rCUDA Client

On a node without a GPU:

export LD_LIBRARY_PATH=/root/rCUDAv16.11.04.02-CUDA8.0/lib
cd Samples/1_Utilities/deviceQuery
make EXTRA_NVCCFLAGS=--cudart=shared
export RCUDA_DEVICE_0=10.20.2.102:0   # remote host IP and GPU index
export RCUDA_DEVICE_COUNT=1
./deviceQuery

The program prints the properties of the remote GPU, confirming successful remote invocation.

rCUDA on Kubernetes (Cloud‑Native Deployment)

A DaemonSet can run an rCUDA server on each GPU‑enabled node. The manifest requests one GPU (nvidia.com/gpu: '1'), enables host networking, and runs the server binary in interactive mode.

kind: DaemonSet
apiVersion: apps/v1
metadata:
  name: rcuda-server
  namespace: default
spec:
  selector:
    matchLabels:
      app: rcuda-server
  template:
    metadata:
      labels:
        app: rcuda-server
    spec:
      hostNetwork: true
      containers:
      - name: container-1
        image: docker.io/lengrongfu/rcuda-server:v0.0.1
        ports:
        - name: http
          containerPort: 8308
          protocol: TCP
        resources:
          limits:
            cpu: 250m
            memory: 512Mi
            nvidia.com/gpu: '1'
      restartPolicy: Always

Dockerfile for the server image builds on an Ubuntu CUDA‑8.0 base, downloads the rCUDA tarball, extracts it, and sets the PATH and LD_LIBRARY_PATH.

FROM nagayosi/ubuntu_gpu_cuda8:latest
RUN apt-get update && apt-get install -y wget
RUN wget -c http://juniorprincewang.github.io/img/rCUDA/rCUDAv16.11.04.02-CUDA8.0-linux64.tgz
RUN tar -zxf rCUDAv16.11.04.02-CUDA8.0-linux64.tgz
ENV PATH=/usr/local/cuda-8.0/bin:$PATH
ENV LD_LIBRARY_PATH=/usr/local/cuda-8.0/lib64:$LD_LIBRARY_PATH
WORKDIR /home/nagayosi/rCUDAv16.11.04.02-CUDA8.0/bin
CMD ["./rCUDAd","-i"]

A Kubernetes Job can act as an rCUDA client. Setting the environment variable RCUDA_DEVICE_0 to the server’s IP and port (e.g., 192.168.0.1@8308:0) directs the CUDA sample to the remote GPU.

kind: Job
apiVersion: batch/v1
metadata:
  name: rcuda-client
  namespace: default
spec:
  template:
    metadata:
      labels:
        app: rcuda-client
    spec:
      containers:
      - name: container-1
        image: docker.io/lengrongfu/rcuda-demo:v0.0.2
        command:
        - cuda-sample/1_Utilities/deviceQuery/deviceQuery
        env:
        - name: RCUDA_DEVICE_0
          value: 192.168.0.1@8308:0
        resources:
          limits:
            cpu: 250m
            memory: 512Mi
          requests:
            cpu: 250m
            memory: 512Mi
      restartPolicy: Never

Running the DaemonSet and Job demonstrates remote GPU access within a Kubernetes cluster.

Reference

[1] cuda‑samples: https://github.com/zchee/cuda-sample.git

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

DockerKubernetesCUDAGPU virtualizationGPU remote invocationrCUDA
Infra Learning Club
Written by

Infra Learning Club

Infra Learning Club shares study notes, cutting-edge technology, and career discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.