Cloud Native 11 min read

How OpenKruiseGame Solves the Last Mile of Cloud‑Native Game Connection Governance

This article explains how cloud‑native game services can achieve precise, state‑aware connection management and graceful shutdown by combining OpenKruiseGame with a cloud‑native API gateway, detailing the challenges of seven‑layer networking, the custom lifecycle hooks, deployment steps, and the benefits of zero‑downtime configuration changes.

Alibaba Cloud Native
Alibaba Cloud Native
Alibaba Cloud Native
How OpenKruiseGame Solves the Last Mile of Cloud‑Native Game Connection Governance

Problem

In a seven‑layer network architecture used by mini‑programs and H5 games, player state and connection management are tightly coupled with the dynamic scaling of service instances. Automated operations such as instance removal or gray‑release deployments must preserve session continuity, which requires real‑time awareness of active connections and full‑stack governance from infrastructure to business logic – the "last mile" of cloud‑native game connection management.

Solution

OpenKruiseGame (OKG) combined with a cloud‑native API gateway provides a game gateway that can gracefully shut down instances without modifying existing business code. When a GameServerSet enters the PreDelete state, the gateway removes the instance from the service registry so new requests are routed elsewhere, while existing player connections are kept alive until the OKG lifecycle hook signals completion.

Deployment steps

Provision an ACK managed Kubernetes cluster and install the ack-kruise-game and ack-kruise components.

Enable the cloud‑native API gateway (e.g., Higress).

Deploy a demo game service (open‑source Posio ) using the following GameServerSet definition, which creates three replicas each with a unique host name for fine‑grained traffic routing:

apiVersion: game.kruise.io/v1alpha1
kind: GameServerSet
metadata:
  name: postio
  namespace: default
spec:
  lifecycle:
    preDelete:
      labelsHandler:
        gs-sync/delete-block: "true"
  replicas: 3
  updateStrategy:
    rollingUpdate:
      podUpdatePolicy: InPlaceIfPossible
  network:
    networkType: Kubernetes-Ingress
    networkConf:
    - name: IngressClassName
      value: "higress"
    - name: Port
      value: "5000"
    - name: Path
      value: "/"
    - name: PathType
      value: Prefix
    - name: Host
      value: game{ID}.postio.example.com
  gameServerTemplate:
    metadata:
      labels:
        gs-sync/delete-block: "true"
    spec:
      containers:
      - name: postio
        image: registry.cn-beijing.aliyuncs.com/chrisliu95/posio:8-24
      volumes:
      - name: gsinfo
        downwardAPI:
          items:
          - path: "state"
            fieldRef:
              fieldPath: metadata.labels['game.kruise.io/gs-state']
      serviceQualities:
      - name: healthy
        containerName: minecraft
        permanent: false
        exec:
          command: ["bash", "./probe.sh"]
        serviceQualityAction:
        - state: true
          result: done
          labels:
            gs-sync/delete-block: "false"
        - state: false
          opsState: None

The three instances receive host names game0.postio.example.com, game1.postio.example.com and game2.postio.example.com, enabling precise client routing and load balancing.

Graceful shutdown logic

OKG extends Kubernetes’ native PreStop hook with a custom lifecycle hook that monitors the active connection count via Prometheus. The shutdown script repeatedly queries the metric envoy_cluster_upstream_cx_active for the instance’s port; when the count reaches zero it prints done and exits with code 0, guaranteeing that all player sessions have terminated safely.

#!/bin/bash
file_path="/etc/gsinfo/state"
if [[ ! -f "$file_path" ]]; then
  exit 1
fi
state_content=$(cat "$file_path")
if [[ "$state_content" == "PreDelete" ]]; then
  query="sum(envoy_cluster_upstream_cx_active{cluster_name=~\"outbound_5000__${HOSTNAME}.default.svc.cluster.local\"})"
  json=$(curl -s -G --data-urlencode "query=$query" http://prometheus.com/api/v1/query)
  value=$(echo "$json" | grep -o '"value":\[[^]]*\]' | sed 's/.*"\([^"]*\)"/\1/')
  if [[ -z "$value" || "$value" == "0" ]]; then
    echo "done"
    exit 0
  fi
  exit 1
fi
exit 1

Zero‑impact configuration changes

Built on the open‑source Higress gateway, the solution allows addition or removal of custom plugins, log format updates, and global parameter modifications without breaking existing long‑lived connections. This enables truly seamless configuration updates in a game‑service environment.

Key technical benefits

Capability changes (e.g., plugins, logging) are applied without disrupting active player connections.

Optimized for long‑connection scenarios, preserving session integrity during configuration updates.

Rich Prometheus metrics and pre‑configured dashboards provide full observability of game server health.

API governance UI simplifies traffic control and load‑balancing configuration.

cloud nativeapi-gatewaygame serverConnection ManagementOpenKruiseGame
Alibaba Cloud Native
Written by

Alibaba Cloud Native

We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.