Infra Learning Club
Infra Learning Club
Feb 15, 2025 · Cloud Native

Advanced Guide: Real‑Time GPU Process Migration in Kubernetes with CRIU

This article explains how os‑criu provides transparent, OS‑level GPU checkpoint/restore, compares its performance with NVIDIA's cuda‑checkpoint, walks through building and installing the PhOS framework, demonstrates migration of a Llama2‑13b‑chat workload in Docker, and discusses current limitations and future Kubernetes integration plans.

CRIUDockerGPU
0 likes · 9 min read
Advanced Guide: Real‑Time GPU Process Migration in Kubernetes with CRIU