How We Transformed a FPS Game to Cloud‑Native with OpenKruiseGame in 2 Months
Facing tight deadlines, Yahaha Studios rebuilt the STRIDEN FPS game's server deployment from a traditional Auto Scaling Group to a cloud‑native architecture using OpenKruiseGame, achieving second‑level startup, automated global scaling, lossless scaling, and significant cost reductions while improving player experience.
About Yahaha
Yahaha Studios is a metaverse platform that lowers the barrier to 3D content creation, allowing users to create, share, and experience 3D games without code.
Preface
STRIDEN is an upcoming FPS online game built on UE5, co‑developed with partner studio 5 Fortress, preparing for a Steam Early Access launch.
1. Why OpenKruiseGame?
We faced two major operational pain points with the previous Auto Scaling Group (ASG) solution:
Complex custom scripts were required for lossless server up/down.
Backend services ran on Kubernetes while game servers were on a separate management system, causing operational overhead.
We needed a unified platform to manage both stateless back‑ends and stateful game servers, leading us to Kubernetes and its cloud‑native ecosystem.
Kubernetes provides declarative APIs and controllers; treating game servers as workloads enables infrastructure‑as‑code.
After evaluating community solutions, we chose OpenKruiseGame (OKG) for its functional adaptability, community support, and native integration with Alibaba Cloud.
Feature adaptability and flexibility: OKG is designed for stateful game workloads, offering advanced networking, QoS, and scaling strategies.
Community support and response speed: The OKG community provides rapid, professional assistance.
Localization and integration advantages: Originating from Alibaba Cloud, OKG integrates well with major cloud providers.
2. Challenges of Cloud‑Native Architecture
Transitioning STRIDEN from ASG to a cloud‑native stack introduced several concrete challenges.
Original Architecture
1. Cold start pain: minute‑level startup time
Creating an EC2 instance → downloading the game server image → extracting → downloading Steam dependencies → launching UE Game Server took about 3 minutes per server.
2. Public IP cost and complexity
High economic cost of assigning an Elastic IP to each server.
Operational complexity of managing thousands of IPs, firewalls, and security groups.
3. Manual “workshop” operations
Lack of automatic elastic scaling; ops had to manually create/destroy instances.
Inability to perform global synchronized releases; updates required per‑region manual steps.
4. “Blind” scaling down
Conservative scaling wastes resources; aggressive scaling disrupts players.
3. Cloud‑Native Solutions with OKG
1. Eliminate cold start with container images
We baked the entire game server, dependencies, and startup scripts into a container image, reducing startup from minutes to seconds.
2. Network standardization using OKG network model
OKG’s automatic network management lets us use the Kubernetes‑HostPort model with a SameAsHost feature to expose the same ports inside and outside the container.
apiVersion: game.kruise.io/v1alpha1
kind: GameServerSet
# ... (omitted for brevity)
spec:
network:
networkType: Kubernetes-HostPort
networkConf:
- name: ContainerPorts
value: "striden-server:SameAsHost/UDP,SameAsHost/UDP"
gameServerTemplate:
spec:
containers:
volumeMounts:
- name: podinfo
mountPath: /etc/podinfo
volumes:
- name: podinfo
downwardAPI:
items:
- path: "network"
fieldRef:
fieldPath: metadata.annotations['game.kruise.io/network-status']3. Declarative API for automated operations
GameServerSet implements automatic scaling similar to Deployment/ReplicaSet, ensuring the desired number of servers across regions.
Declarative global releases via GitOps allow a single change to propagate to all clusters.
Intelligent scaling policies: prioritize terminating servers with playerCount == 0, define minimum number of servers with opsState == None, and use lifecycle hooks for graceful deletion.
4. Results
Metric
Traditional ASG
Cloud‑Native OKG
Improvement
Server startup time
~3 minutes
<10 seconds
Second‑level startup
Scaling response
Minute‑level
Second‑level
Real‑time elasticity
Version release mode
Manual, per‑region
Declarative, global sync
Automation, high efficiency
Scaling down strategy
Blind guess
Player‑state aware
Lossless scaling, cost optimization
Ops labor cost
High
Low (IaC)
Significant reduction
In summary, OKG provided a complete methodology and toolset that transformed STRIDEN’s operations from a passive, manual model to an automated, cloud‑native paradigm, delivering cost savings and a better player experience.
Conclusion and Outlook
The success of the STRIDEN migration demonstrates a repeatable path for modern online games to adopt cloud‑native technologies, paving the way for deeper AIOps integration and more flexible scheduling strategies.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
