Infra Learning Club
Feb 15, 2025 · Cloud Native
Advanced Guide: Real‑Time GPU Process Migration in Kubernetes with CRIU
This article explains how os‑criu provides transparent, OS‑level GPU checkpoint/restore, compares its performance with NVIDIA's cuda‑checkpoint, walks through building and installing the PhOS framework, demonstrates migration of a Llama2‑13b‑chat workload in Docker, and discusses current limitations and future Kubernetes integration plans.
CRIUDockerGPU
0 likes · 9 min read
