Cloud Native 16 min read

Cloud‑Native Migration of Tencent Happy Game Studio Backend Using Istio Service Mesh

The article details how Tencent's Happy Game Studio transformed its large‑scale, low‑utilization backend from a legacy distributed architecture to a cloud‑native, Istio‑enabled service‑mesh platform, achieving significant resource savings, smoother deployments, and improved observability across game, CGI, and storage services.

High Availability Architecture

Feb 11, 2022

Cloud‑Native Migration of Tencent Happy Game Studio Backend Using Istio Service Mesh

Chen Zhiwei, a senior backend expert at Tencent, leads the public backend R&D and team management for the Happy Game Studio, which operates a distributed micro‑service platform serving tens of millions of daily active users.

The legacy on‑premise architecture, inherited from QQGame, consists of dozens of self‑developed frameworks and hundreds of micro‑services, but suffers from low CPU utilization (average < 20%), fragmented service governance, cumbersome deployment, and high operational overhead.

To address these challenges, the team embraced a cloud‑native strategy, tightly integrating Kubernetes (K8s) and Istio. They introduced gRPC support, built a MeshGate bridge to connect cloud‑side mesh services with on‑premise services, and gradually migrated workloads without downtime.

Key outcomes of the migration include:

CPU utilization improved by 60‑70% due to pod‑level resource granularity and auto‑scaling.

Helm‑based declarative deployment and one‑click roll‑outs reduced operational effort.

Istio provided powerful service‑governance, observability, and traffic‑management capabilities.

For private‑protocol services, the team developed MeshGate to act as a bidirectional proxy, converting between gRPC and the original protocol, and deployed it alongside Envoy to leverage Istio’s control plane while preserving authentication, encryption, and connection management.

Performance tests showed that after integrating Envoy, private‑protocol forwarding latency remained comparable to on‑premise direct connections (average 0.62 ms vs. 0.38 ms), while pure gRPC over Istio incurred higher latency (average 6.23 ms), confirming the suitability of the hybrid approach for latency‑sensitive game traffic.

Scenario

Average Latency

P95 Latency

On‑premise direct

0.38ms

0.67ms

K8s pod‑to‑pod

0.52ms

0.90ms

Istio + TCP (private protocol)

0.62ms

1.26ms

Istio + gRPC

6.23ms

14.62ms

The GameSvr service, previously a monolithic game‑room server, was re‑architected to run on K8s with Istio mesh, achieving near‑zero downtime migration, a two‑thirds reduction in CPU and memory usage, and automated scaling based on load.

For the massive CGI services (≈350 instances), the team applied two strategies: high‑traffic CGIs were refactored to use coroutine‑based asynchronous handling with http‑parser and libco, while low‑traffic CGIs were containerized together with Apache and migrated in bulk, achieving up to 85% CPU and 70% memory savings.

The in‑house CubeDB storage, holding tens of terabytes across hundreds of MySQL tables, was migrated to Tencent's TcaplusDB via a Cube2TcaplusProxy that adapts the private protocol, enabling seamless data sync and lossless cut‑over.

Multi‑cluster deployment was realized by assigning different business teams to separate K8s clusters while sharing common services in a public cluster, with Istio control‑plane federation provided by Tencent Cloud Mesh (TCM) to enable low‑cost cross‑cluster communication.

In summary, through systematic analysis and cloud‑native refactoring, the Happy Game Studio achieved a smooth, high‑quality migration to a Kubernetes‑Istio mesh, gaining automated deployment, service discovery, elastic scaling, robust governance, and comprehensive observability, while dramatically improving reliability, maintainability, and operational efficiency.

cloud-native microservices Kubernetes Resource Optimization Istio service-mesh

Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.