Operations 22 min read

How Shanda Games Built a Scalable Automated Operations System

This article details Shanda Games' journey in designing and implementing a comprehensive automated operations platform—including installation, deployment, security, client and server updates, data analysis, backup, and monitoring—to efficiently manage hundreds of games across diverse hardware and operating systems.

Efficient Ops
Efficient Ops
Efficient Ops
How Shanda Games Built a Scalable Automated Operations System

Introduction

Xu Feng, a senior researcher at Shanda Games, introduces his background and the purpose of the talk: to share the design and practice of an automated operations system that addresses the "Why" and "What" of automation.

Why Build an Automated Operations System?

Shanda operates hundreds of games with complex architectures, multiple operating systems (Windows, Linux), and a wide variety of server hardware purchased over many years. Personnel skill levels vary, making a standardized, automated approach essential for efficiency, consistency, and security.

Automation Goals

Completeness – cover all operational needs.

Simplicity – easy to use and understand.

Efficiency – provide timely feedback for batch tasks.

Security – protect the system from takeover.

Subsystem Overview

1. Automated Installation System

Servers are installed via PXE, automatically detect OS type, install required drivers, and apply basic security settings such as firewall rules and disabled Windows sharing.

2. Automated Operations Platform

The platform serves as the operators' console, handling heterogeneous OS environments and large server fleets. It is browser‑based, uses SSH for both Linux and Windows management, and avoids custom agents to reduce maintenance overhead.

3. Automated Security Inspection System

Before files reach players, they undergo virus scanning; server‑side assets are checked via continuous security scans to prevent exposure of vulnerable ports or IPs.

4. Automated Client Update System

Handles peak‑time bandwidth spikes (hundreds of gigabits) and mitigates issues such as illegal caching by ISPs. Uses a multi‑CDN strategy with 302 redirects to balance traffic and employs HTTPS‑encrypted small‑file delivery (code‑named "Dorado") to bypass ISP caches.

5. Automated Server‑Side Update System

Adopts a CDN‑like model where target servers download updates from central nodes via cache servers, avoiding P2P due to security and traffic‑control concerns.

6. Automated Data Analysis System

Collects client download logs, aggregates them in a Tomcat cluster, stores results in MongoDB, and visualizes funnel‑style conversion from download to game login, helping identify failures and improve user experience.

7. Automated Data Backup System

Moves from scattered FTP‑to‑tape backups to a centralized solution: load‑balanced upload endpoints, MD5 verification, and storage in a Hadoop HDFS cluster (tens of PB) with UDP‑based transfer to tolerate high latency and packet loss.

8. Automated Monitoring and Alert System

Monitors IDC link quality, server health, network traffic, system logs, application metrics, and client SDK data. Business‑level indicators such as online player count trigger alerts when thresholds are breached.

Summary

The automation effort, spanning from 2000 to the present, emphasizes incremental development, scalability, and leveraging mature protocols rather than reinventing wheels. Small‑to‑medium companies are advised to start with targeted solutions, ensure extensibility, and prioritize practical, proven tools.

Q & A

Q: What software is used for the UDP‑based file transfer?

A: A custom‑built tool; commercial options exist but are costly. UDP is repurposed for file transfer by segmenting files, receiving fragments on the server, and requesting missing pieces, similar to HTTP range requests.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Monitoringautomationoperationsdeploymentsystem designgame infrastructure
Efficient Ops
Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.