Why NVIDIA Chose Go for Its GPU Cloud Platform: Inside the AI Infrastructure Rewrite

NVIDIA quietly rewrote its AI cloud platform using Go, open‑sourcing NVCF, AICR, and AIStore, where Go accounts for over 80% of the code, enabling a three‑plane architecture, scale‑to‑zero via NATS JetStream, and a cloud‑native stack that balances performance, maintainability, and rapid iteration.

TonyBai
TonyBai
TonyBai
Why NVIDIA Chose Go for Its GPU Cloud Platform: Inside the AI Infrastructure Rewrite

NVCF: Open‑Source GPU Function Platform

NVIDIA released the full source of its GPU function platform (NVCF) under Apache 2.0 in April 2026. The repository shows Go comprising 88.5% of the code, driving the control plane, invocation plane, and compute plane of the platform.

Three‑Plane Architecture – Go as the Glue

The system is split into three independent, scalable planes connected by NATS JetStream :

Control Plane : runs on a dedicated Kubernetes cluster, handling function lifecycle, autoscaling, and key management. Core services include function‑autoscaler (Rust), helm‑reval (Go), OpenBao for secret encryption, and Cassandra for distributed locks.

Invocation Plane : the request path where Go dominates. Components such as http‑invocation (Rust/Axum), llm‑gateway (Go), grpc‑proxy (Go), ratelimiter (Go), and nats‑auth‑callout (Go) handle HTTP/gRPC routing, token‑aware rate limiting, and NATS authentication.

Compute Plane : each GPU cluster runs an NVCA operator that registers with the control plane, consumes NATS messages, and manages pod lifecycles.

Full Request Lifecycle

A request follows eight steps from POST /v2/nvcf/pexec/functions/{id} to response, involving rate‑limit checks, NATS publishing, NVCA queue consumption, CR creation, pod orchestration, gRPC callbacks, and final response delivery.

Scale‑to‑Zero via Persistent NATS Buffer

Traditional solutions like Knative suffer timeouts during scale‑up. NVCF solves this by treating NATS JetStream as a persistent request buffer, ensuring zero request loss and graceful cold‑start handling. The article compares NVCF with Knative across request buffering, cold‑start behavior, multi‑cluster routing, and workload suitability.

AICR: AI Cluster Runtime

AICR encapsulates validated, reproducible recipes for GPU‑accelerated Kubernetes clusters. It locks down driver, operator, kernel, and system configurations into versioned recipes that can be rendered as Helm charts or ArgoCD manifests. The CLI, built as a single static Go binary, supports snapshotting, recipe generation, validation, and bundling.

AIStore: Distributed Storage for AI

AIStore (AIS) is a production‑grade, Go‑centric distributed storage stack with 1.8k Stars, 10 k+ commits, and 46 contributors. It offers multi‑cloud backend access, linear scalability, ETL offload, batch‑get for ML pipelines, and load‑aware throttling. Go’s concurrency model, single‑binary deployment, and mature ecosystem (Kubernetes operators, gRPC, NATS, Prometheus) are cited as primary reasons for its adoption.

Why Go? NVIDIA’s Technical Selection Logic

The three projects share common AI‑infrastructure requirements: fine‑grained GPU scheduling, massive concurrent request queuing, cross‑cluster coordination, heavy I/O for model files, and long‑running asynchronous tasks. Go’s goroutine and channel model satisfies these concurrency demands while keeping code clear.

Additionally, the cloud‑native ecosystem (Kubernetes, Docker, containerd, Prometheus, NATS, Helm) is largely written in Go, making Go the “native language” for seamless integration without cross‑language overhead.

Single‑binary static builds simplify deployment, CI/CD, and edge‑node operations. Compared with C++, Go delivers comparable low‑level performance with far lower maintenance complexity.

Rust + Go Complementary Division

Performance‑critical hot paths (e.g., http‑invocation, function‑autoscaler) are implemented in Rust, while control logic, gateways, proxies, and authentication—areas needing rapid iteration—are written in Go. This division lets each language play to its strengths.

Implications

For Go developers, the three open‑source repos provide real‑world examples of Kubernetes operators, gRPC services, NATS integration, and large‑scale distributed systems. AI platform engineers can run NVIDIA‑level scheduling on private GPU clusters, audit code, and customize autoscaling or authentication. Technical decision‑makers see NVIDIA’s shift from CUDA‑centric C++ to Go as a strong signal that Go is becoming a core language for AI‑era infrastructure.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesGoGPUNVIDIAAI infrastructureNATS
TonyBai
Written by

TonyBai

Tony Bai's tech world (tonybai.com). Not satisfied with just "knowing how", we strive for mastery. Focused on Go language internals, high-quality engineering practices, and cloud‑native architecture, exploring cutting‑edge intersections of Go and AI. Gophers who pursue technology are welcome—follow me and evolve with Go.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.