Cloud Native 11 min read

How Airbnb Dynamically Scales Kubernetes Clusters with Custom Autoscaler

Airbnb migrated its services to Kubernetes and, over four years, evolved from manual scaling of homogeneous clusters to heterogeneous clusters with automated scaling, introducing a custom gRPC expander for the Cluster Autoscaler that enables weighted priority and plug‑in extensibility, reducing costs and operational overhead.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How Airbnb Dynamically Scales Kubernetes Clusters with Custom Autoscaler

Airbnb migrated almost all online services to Kubernetes, operating hundreds of clusters with thousands of nodes, and needed dynamic scaling to handle large traffic fluctuations.

Airbnb's Kubernetes clusters

The evolution occurred in three stages:

Stage 1: Homogeneous clusters with manual scaling.

Stage 2: Multiple cluster types with independent scaling.

Stage 3: Heterogeneous clusters with automated scaling.

Stage 1: Homogeneous clusters, manual scaling

Initially each service ran on dedicated machines; capacity was manually allocated and rarely reduced.

Stage 2: Multiple cluster types, independent scaling

Different workloads required distinct configurations, leading to abstract cluster types and the introduction of the Kubernetes Cluster Autoscaler, which adds nodes for pending pods and removes underutilized nodes, saving about 5% of cloud costs.

Stage 3: Heterogeneous clusters, automated scaling

With over 30 cluster types and 100 clusters, management became cumbersome; consolidating into heterogeneous clusters under a single control plane reduced testing overhead and improved utilization, enabling more sophisticated scaling strategies.

Cluster Autoscaler improvements

Custom gRPC expander

Airbnb added a new Expander component that determines which node groups to scale by simulating scheduling of pending pods and filtering groups. The default random expander was insufficient for their cost and instance‑type requirements, so they implemented a priority expander and later a weighted‑priority expander.

The solution separates the expansion logic from the core Autoscaler via a plug‑in gRPC expander consisting of a client built into the Autoscaler and an external gRPC server that returns the best options.

service Expander {
  rpc BestOptions (BestOptionsRequest) returns (BestOptionsResponse)
}
message BestOptionsRequest {
  repeated Option options;
  map<string, k8s.io.api.core.v1.Node> nodeInfoMap;
}
message BestOptionsResponse {
  repeated Option options;
}
message Option {
  // ID of node to uniquely identify the nodeGroup
  string nodeGroupId;
  int32 nodeCount;
  string debug;
  repeated k8s.io.api.core.v1.Pod pod;
}

The design meets three requirements: extensibility for other users, independent deployment for rapid business changes, and seamless integration with the Autoscaler ecosystem.

Since 2022 Airbnb has used this approach in production without issues, and the custom expander was upstreamed to the Cluster Autoscaler and will be available in version v1.24.0.

Conclusion

Over four years Airbnb advanced its Kubernetes cluster configuration, contributing custom Autoscaler extensions that enable cost‑aware, multi‑instance‑type scaling strategies while reducing operational overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeKubernetesgRPCDynamic ScalingCluster AutoscalerCustom Expander
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.