Why NVIDIA Chose Go for Its GPU Cloud Platform: Inside the AI Infrastructure Rewrite
NVIDIA quietly rewrote its AI cloud platform using Go, open‑sourcing NVCF, AICR, and AIStore, where Go accounts for over 80% of the code, enabling a three‑plane architecture, scale‑to‑zero via NATS JetStream, and a cloud‑native stack that balances performance, maintainability, and rapid iteration.
NVCF: Open‑Source GPU Function Platform
NVIDIA released the full source of its GPU function platform (NVCF) under Apache 2.0 in April 2026. The repository shows Go comprising 88.5% of the code, driving the control plane, invocation plane, and compute plane of the platform.
Three‑Plane Architecture – Go as the Glue
The system is split into three independent, scalable planes connected by NATS JetStream :
Control Plane : runs on a dedicated Kubernetes cluster, handling function lifecycle, autoscaling, and key management. Core services include function‑autoscaler (Rust), helm‑reval (Go), OpenBao for secret encryption, and Cassandra for distributed locks.
Invocation Plane : the request path where Go dominates. Components such as http‑invocation (Rust/Axum), llm‑gateway (Go), grpc‑proxy (Go), ratelimiter (Go), and nats‑auth‑callout (Go) handle HTTP/gRPC routing, token‑aware rate limiting, and NATS authentication.
Compute Plane : each GPU cluster runs an NVCA operator that registers with the control plane, consumes NATS messages, and manages pod lifecycles.
Full Request Lifecycle
A request follows eight steps from POST /v2/nvcf/pexec/functions/{id} to response, involving rate‑limit checks, NATS publishing, NVCA queue consumption, CR creation, pod orchestration, gRPC callbacks, and final response delivery.
Scale‑to‑Zero via Persistent NATS Buffer
Traditional solutions like Knative suffer timeouts during scale‑up. NVCF solves this by treating NATS JetStream as a persistent request buffer, ensuring zero request loss and graceful cold‑start handling. The article compares NVCF with Knative across request buffering, cold‑start behavior, multi‑cluster routing, and workload suitability.
AICR: AI Cluster Runtime
AICR encapsulates validated, reproducible recipes for GPU‑accelerated Kubernetes clusters. It locks down driver, operator, kernel, and system configurations into versioned recipes that can be rendered as Helm charts or ArgoCD manifests. The CLI, built as a single static Go binary, supports snapshotting, recipe generation, validation, and bundling.
AIStore: Distributed Storage for AI
AIStore (AIS) is a production‑grade, Go‑centric distributed storage stack with 1.8k Stars, 10 k+ commits, and 46 contributors. It offers multi‑cloud backend access, linear scalability, ETL offload, batch‑get for ML pipelines, and load‑aware throttling. Go’s concurrency model, single‑binary deployment, and mature ecosystem (Kubernetes operators, gRPC, NATS, Prometheus) are cited as primary reasons for its adoption.
Why Go? NVIDIA’s Technical Selection Logic
The three projects share common AI‑infrastructure requirements: fine‑grained GPU scheduling, massive concurrent request queuing, cross‑cluster coordination, heavy I/O for model files, and long‑running asynchronous tasks. Go’s goroutine and channel model satisfies these concurrency demands while keeping code clear.
Additionally, the cloud‑native ecosystem (Kubernetes, Docker, containerd, Prometheus, NATS, Helm) is largely written in Go, making Go the “native language” for seamless integration without cross‑language overhead.
Single‑binary static builds simplify deployment, CI/CD, and edge‑node operations. Compared with C++, Go delivers comparable low‑level performance with far lower maintenance complexity.
Rust + Go Complementary Division
Performance‑critical hot paths (e.g., http‑invocation, function‑autoscaler) are implemented in Rust, while control logic, gateways, proxies, and authentication—areas needing rapid iteration—are written in Go. This division lets each language play to its strengths.
Implications
For Go developers, the three open‑source repos provide real‑world examples of Kubernetes operators, gRPC services, NATS integration, and large‑scale distributed systems. AI platform engineers can run NVIDIA‑level scheduling on private GPU clusters, audit code, and customize autoscaling or authentication. Technical decision‑makers see NVIDIA’s shift from CUDA‑centric C++ to Go as a strong signal that Go is becoming a core language for AI‑era infrastructure.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
TonyBai
Tony Bai's tech world (tonybai.com). Not satisfied with just "knowing how", we strive for mastery. Focused on Go language internals, high-quality engineering practices, and cloud‑native architecture, exploring cutting‑edge intersections of Go and AI. Gophers who pursue technology are welcome—follow me and evolve with Go.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
