How a Gaming Company Built a Scalable Automated Operations System
This case study details why and how a game‑focused company designed, implemented, and refined a comprehensive automated operations platform—covering installation, management, security, client updates, data analysis, backup, and monitoring—to boost efficiency, reliability, and security across hundreds of servers.
Introduction
Many startups and small‑to‑medium enterprises still manage servers manually using tools like SecureCRT or Windows Remote Desktop, which leads to slow, error‑prone, and inconsistent configurations.
Manual operations become a bottleneck as server counts grow, and differences in configuration are hard to detect, especially in load‑balanced groups.
Scripts and batch tools improve efficiency but introduce problems such as non‑standardized scripts, knowledge transfer gaps, and tool fragmentation.
Why Build an Automated Operations System?
The need for automation arises from three main factors:
Game‑related demands: hundreds of games, diverse architectures, and multiple operating systems.
Hardware diversity: many servers from various OEMs purchased over a decade.
Human factors: varying skill levels and habits among operations staff.
Goals of the Automated System
Completeness : cover all operational needs.
Simplicity : easy to use with low learning cost.
Efficiency : fast feedback for batch tasks.
Security : protect against attacks.
Architecture Overview
The system consists of several sub‑systems that work together, as shown in the diagram below.
1. Automated Installation System
Uses PXE boot to select OS (Windows or Linux), auto‑detects drivers, and applies basic security settings before handover.
2. Automated Operations Platform
Browser‑based UI for remote management.
Unified handling of heterogeneous servers, including Windows via SSH.
Leverages existing protocols (SSH) instead of custom agents.
3. Automated Security Inspection System
Security‑scan platform checks client files for viruses before distribution.
Continuous server‑side scanning to detect misconfigurations.
4. Automated Client Update System
Two solutions address large‑file distribution and illegal caching:
Autopatch : uploads files, runs security checks, then distributes via multiple CDNs using HTTP 302 redirects for load‑balanced delivery.
Dorado : encrypts critical small files with HTTPS on private nodes to bypass ISP caching.
5. Automated Server‑Side Update System
Adopts a CDN‑like architecture with cache nodes; P2P was considered but rejected due to security and traffic‑control concerns.
6. Automated Data Analysis System
Collects SDK‑reported events from client download to game login, stores them in a Tomcat cluster, and writes to MongoDB to build a funnel‑style conversion view.
7. Automated Data Backup System
Initial FTP‑to‑tape approach evolved into a centralized backup using a load‑balancer, MD5 verification, and Hadoop HDFS for petabyte‑scale storage, with UDP‑based uploads to overcome high‑latency networks.
8. Automated Monitoring & Alert System
Monitors IDC links, server health, network traffic, system logs, application metrics, and client SDK data; alerts are triggered based on configurable thresholds.
Conclusion
Key takeaways for building an automated operations platform:
Start small and iterate—focus on immediate pain points before expanding.
Design for scalability; ensure the system can handle growth from tens to thousands of servers.
Prefer mature, open‑source tools (e.g., SSH, OpenSSH) over reinventing the wheel to improve stability and security.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Efficient Ops
This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
