How OpenKruiseGame Solves the Last Mile of Cloud‑Native Game Connection Governance
This article explains how cloud‑native game services can achieve precise, state‑aware connection management and graceful shutdown by combining OpenKruiseGame with a cloud‑native API gateway, detailing the challenges of seven‑layer networking, the custom lifecycle hooks, deployment steps, and the benefits of zero‑downtime configuration changes.
Problem
In a seven‑layer network architecture used by mini‑programs and H5 games, player state and connection management are tightly coupled with the dynamic scaling of service instances. Automated operations such as instance removal or gray‑release deployments must preserve session continuity, which requires real‑time awareness of active connections and full‑stack governance from infrastructure to business logic – the "last mile" of cloud‑native game connection management.
Solution
OpenKruiseGame (OKG) combined with a cloud‑native API gateway provides a game gateway that can gracefully shut down instances without modifying existing business code. When a GameServerSet enters the PreDelete state, the gateway removes the instance from the service registry so new requests are routed elsewhere, while existing player connections are kept alive until the OKG lifecycle hook signals completion.
Deployment steps
Provision an ACK managed Kubernetes cluster and install the ack-kruise-game and ack-kruise components.
Enable the cloud‑native API gateway (e.g., Higress).
Deploy a demo game service (open‑source Posio ) using the following GameServerSet definition, which creates three replicas each with a unique host name for fine‑grained traffic routing:
apiVersion: game.kruise.io/v1alpha1
kind: GameServerSet
metadata:
name: postio
namespace: default
spec:
lifecycle:
preDelete:
labelsHandler:
gs-sync/delete-block: "true"
replicas: 3
updateStrategy:
rollingUpdate:
podUpdatePolicy: InPlaceIfPossible
network:
networkType: Kubernetes-Ingress
networkConf:
- name: IngressClassName
value: "higress"
- name: Port
value: "5000"
- name: Path
value: "/"
- name: PathType
value: Prefix
- name: Host
value: game{ID}.postio.example.com
gameServerTemplate:
metadata:
labels:
gs-sync/delete-block: "true"
spec:
containers:
- name: postio
image: registry.cn-beijing.aliyuncs.com/chrisliu95/posio:8-24
volumes:
- name: gsinfo
downwardAPI:
items:
- path: "state"
fieldRef:
fieldPath: metadata.labels['game.kruise.io/gs-state']
serviceQualities:
- name: healthy
containerName: minecraft
permanent: false
exec:
command: ["bash", "./probe.sh"]
serviceQualityAction:
- state: true
result: done
labels:
gs-sync/delete-block: "false"
- state: false
opsState: NoneThe three instances receive host names game0.postio.example.com, game1.postio.example.com and game2.postio.example.com, enabling precise client routing and load balancing.
Graceful shutdown logic
OKG extends Kubernetes’ native PreStop hook with a custom lifecycle hook that monitors the active connection count via Prometheus. The shutdown script repeatedly queries the metric envoy_cluster_upstream_cx_active for the instance’s port; when the count reaches zero it prints done and exits with code 0, guaranteeing that all player sessions have terminated safely.
#!/bin/bash
file_path="/etc/gsinfo/state"
if [[ ! -f "$file_path" ]]; then
exit 1
fi
state_content=$(cat "$file_path")
if [[ "$state_content" == "PreDelete" ]]; then
query="sum(envoy_cluster_upstream_cx_active{cluster_name=~\"outbound_5000__${HOSTNAME}.default.svc.cluster.local\"})"
json=$(curl -s -G --data-urlencode "query=$query" http://prometheus.com/api/v1/query)
value=$(echo "$json" | grep -o '"value":\[[^]]*\]' | sed 's/.*"\([^"]*\)"/\1/')
if [[ -z "$value" || "$value" == "0" ]]; then
echo "done"
exit 0
fi
exit 1
fi
exit 1Zero‑impact configuration changes
Built on the open‑source Higress gateway, the solution allows addition or removal of custom plugins, log format updates, and global parameter modifications without breaking existing long‑lived connections. This enables truly seamless configuration updates in a game‑service environment.
Key technical benefits
Capability changes (e.g., plugins, logging) are applied without disrupting active player connections.
Optimized for long‑connection scenarios, preserving session integrity during configuration updates.
Rich Prometheus metrics and pre‑configured dashboards provide full observability of game server health.
API governance UI simplifies traffic control and load‑balancing configuration.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
