Cloud Native 18 min read

How We Transformed a FPS Game to Cloud‑Native with OpenKruiseGame in 2 Months

Facing tight deadlines, Yahaha Studios rebuilt the STRIDEN FPS game's server deployment from a traditional Auto Scaling Group to a cloud‑native architecture using OpenKruiseGame, achieving second‑level startup, automated global scaling, lossless scaling, and significant cost reductions while improving player experience.

Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
How We Transformed a FPS Game to Cloud‑Native with OpenKruiseGame in 2 Months

About Yahaha

Yahaha Studios is a metaverse platform that lowers the barrier to 3D content creation, allowing users to create, share, and experience 3D games without code.

Preface

STRIDEN is an upcoming FPS online game built on UE5, co‑developed with partner studio 5 Fortress, preparing for a Steam Early Access launch.

1. Why OpenKruiseGame?

We faced two major operational pain points with the previous Auto Scaling Group (ASG) solution:

Complex custom scripts were required for lossless server up/down.

Backend services ran on Kubernetes while game servers were on a separate management system, causing operational overhead.

We needed a unified platform to manage both stateless back‑ends and stateful game servers, leading us to Kubernetes and its cloud‑native ecosystem.

Kubernetes provides declarative APIs and controllers; treating game servers as workloads enables infrastructure‑as‑code.

After evaluating community solutions, we chose OpenKruiseGame (OKG) for its functional adaptability, community support, and native integration with Alibaba Cloud.

Feature adaptability and flexibility: OKG is designed for stateful game workloads, offering advanced networking, QoS, and scaling strategies.

Community support and response speed: The OKG community provides rapid, professional assistance.

Localization and integration advantages: Originating from Alibaba Cloud, OKG integrates well with major cloud providers.

2. Challenges of Cloud‑Native Architecture

Transitioning STRIDEN from ASG to a cloud‑native stack introduced several concrete challenges.

Original Architecture

1. Cold start pain: minute‑level startup time

Creating an EC2 instance → downloading the game server image → extracting → downloading Steam dependencies → launching UE Game Server took about 3 minutes per server.

2. Public IP cost and complexity

High economic cost of assigning an Elastic IP to each server.

Operational complexity of managing thousands of IPs, firewalls, and security groups.

3. Manual “workshop” operations

Lack of automatic elastic scaling; ops had to manually create/destroy instances.

Inability to perform global synchronized releases; updates required per‑region manual steps.

4. “Blind” scaling down

Conservative scaling wastes resources; aggressive scaling disrupts players.

3. Cloud‑Native Solutions with OKG

1. Eliminate cold start with container images

We baked the entire game server, dependencies, and startup scripts into a container image, reducing startup from minutes to seconds.

2. Network standardization using OKG network model

OKG’s automatic network management lets us use the Kubernetes‑HostPort model with a SameAsHost feature to expose the same ports inside and outside the container.

apiVersion: game.kruise.io/v1alpha1
kind: GameServerSet
# ... (omitted for brevity)
spec:
  network:
    networkType: Kubernetes-HostPort
    networkConf:
      - name: ContainerPorts
        value: "striden-server:SameAsHost/UDP,SameAsHost/UDP"
  gameServerTemplate:
    spec:
      containers:
        volumeMounts:
          - name: podinfo
            mountPath: /etc/podinfo
      volumes:
        - name: podinfo
          downwardAPI:
            items:
              - path: "network"
                fieldRef:
                  fieldPath: metadata.annotations['game.kruise.io/network-status']

3. Declarative API for automated operations

GameServerSet implements automatic scaling similar to Deployment/ReplicaSet, ensuring the desired number of servers across regions.

Declarative global releases via GitOps allow a single change to propagate to all clusters.

Intelligent scaling policies: prioritize terminating servers with playerCount == 0, define minimum number of servers with opsState == None, and use lifecycle hooks for graceful deletion.

4. Results

Metric

Traditional ASG

Cloud‑Native OKG

Improvement

Server startup time

~3 minutes

<10 seconds

Second‑level startup

Scaling response

Minute‑level

Second‑level

Real‑time elasticity

Version release mode

Manual, per‑region

Declarative, global sync

Automation, high efficiency

Scaling down strategy

Blind guess

Player‑state aware

Lossless scaling, cost optimization

Ops labor cost

High

Low (IaC)

Significant reduction

In summary, OKG provided a complete methodology and toolset that transformed STRIDEN’s operations from a passive, manual model to an automated, cloud‑native paradigm, delivering cost savings and a better player experience.

Conclusion and Outlook

The success of the STRIDEN migration demonstrates a repeatable path for modern online games to adopt cloud‑native technologies, paving the way for deeper AIOps integration and more flexible scheduling strategies.

cloud-nativedeploymentKubernetesAuto Scalinggame-serversOpenKruiseGame
Alibaba Cloud Infrastructure
Written by

Alibaba Cloud Infrastructure

For uninterrupted computing services

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.