Operations 34 min read

Comprehensive Operations Engineering Guide: Concepts, Tools, and Interview Questions

This article provides a detailed overview of operations engineering, covering the definition of ops and game ops, interactions with product teams, server management strategies, RAID levels, load‑balancing technologies (LVS, Nginx, HAProxy), caching solutions, middleware, MySQL troubleshooting, backup methods, Keepalived, Linux security, networking layers, and practical shell scripts for monitoring and automation.

Practical DevOps Architecture
Practical DevOps Architecture
Practical DevOps Architecture
Comprehensive Operations Engineering Guide: Concepts, Tools, and Interview Questions

1. What is Operations? What is Game Operations? Operations refers to the maintenance of an organization’s network hardware and software to ensure services run smoothly, encompassing networking, systems, databases, development, security, and monitoring. Game operations are divided into development ops (building tools), application ops (service deployment and troubleshooting), and system ops (providing infrastructure).

2. Role of Operations vs. Product Teams Game product teams coordinate releases, schedule server openings, manage user acquisition, and plan activities, requiring close collaboration with ops.

3. Managing 300 Servers Use a jump host with unified accounts, employ configuration management tools such as salt , ansible , or puppet , and maintain a simple CMDB for system, configuration, and application information.

4. RAID Levels RAID 0 offers high read/write speed without redundancy; RAID 1 mirrors data across two disks for 100% redundancy; RAID 5 distributes parity across at least three disks, balancing performance and fault tolerance. The article also compares redundancy, performance, and cost across RAID 0, 1, 5, and 10.

5. LVS, Nginx, HAProxy Differences LVS operates at Layer 4 (port forwarding only); HAProxy works at Layers 4 & 7 and is a professional proxy; Nginx is a web server, cache, and reverse proxy capable of Layer 7 forwarding. Selection depends on traffic volume and feature needs.

6. Squid, Varnish, and Nginx All are proxy servers. Squid and Varnish specialize in caching (Varnish offers higher performance and regex‑based cache invalidation), while Nginx provides reverse‑proxy and web‑server functions with limited caching capabilities.

7. Tomcat vs. Resin Tomcat has a larger user base and better Java compatibility but lower performance; Resin offers higher performance but fewer resources. Large enterprises often choose Resin, while smaller companies prefer Tomcat for stability.

8. Middleware and JDK Middleware is independent software that enables distributed applications to share resources and communicate across platforms. The JDK (Java Development Kit) is the development environment for building Java applications.

9. Tomcat Ports 8005 – shutdown port; 8009 – AJP connector for Apache; 8080 – default HTTP port for applications.

10. CDN Definition A Content Delivery Network distributes website content to edge locations nearest to users, reducing latency and improving access speed.

11. Gray‑Release (Canary Deployment) Gradually roll out new versions to a subset of users (A/B testing) before full deployment, allowing early detection of issues.

12. DNS Resolution Process The resolver checks the local hosts file, then the configured DNS server, followed by root, TLD, second‑level, and finally authoritative servers to obtain the IP address.

13. RabbitMQ A message‑queue middleware that stores messages temporarily, routes them, and ensures delivery even if consumers are unavailable.

14. Keepalived Working Principle Based on VRRP, one master advertises a virtual IP; backups monitor the advertisements and take over if the master fails, providing high availability.

15. LVS Modes NAT (VS‑NAT) rewrites source/destination IPs; TUN (IP‑tunnel) forwards requests to real servers that reply directly to clients; DR (direct routing) shares a virtual IP among servers and uses ARP for request distribution.

16. MySQL InnoDB Lock Diagnosis & Replication Lag Reduction Use SHOW ENGINE INNODB STATUS and the tables information_schema.innodb_trx , innodb_locks , innodb_lock_waits . Reduce lag by improving hardware, enabling multi‑threaded replication, optimizing slow queries, and tuning parameters such as slave‑net‑timeout and master‑connect‑retry .

17. Resetting MySQL Root Password If the password is known, use mysqladmin -u root -p password "newpwd" or update the mysql.user table. If forgotten, start mysqld_safe --skip-grant-table , then set a new password via SQL.

18. LVS vs. Nginx vs. HAProxy Nginx offers Layer 7 routing, easy configuration, and rich modules; HAProxy provides professional proxy features, session persistence, and extensive load‑balancing algorithms; LVS delivers high‑performance Layer 4 distribution with low CPU/memory usage but lacks Layer 7 capabilities.

19. MySQL Backup Tools mysqldump (logical backup), LVM snapshots (physical), tar archives, and Percona xtrabackup (hot physical backup with incremental support).

20. Keepalived Health Checks Supports HTTP_GET , SSL_GET , and custom scripts; configuration includes URL, expected status code, timeout, and retry settings.

21‑23. Useful Linux Commands Analyze Nginx logs, capture traffic with tcpdump , and forward ports using iptables .

24‑26. Additional RAID, TCP/IP Model, and Script Examples Summarize RAID 0/1/5 principles, outline the OSI 7‑layer model, and provide Bash scripts for IP scanning and service monitoring (e.g., seq 1 255 loop with ping ).

27‑29. Server Failure Diagnosis and Virus Removal Step‑by‑step troubleshooting flowcharts, use of top , ps , lsof , and removal of malicious files.

30‑31. TCP/IP Seven‑Layer Model and Common Nginx Modules Detailed description of each layer and frequently used Nginx modules such as rewrite , access , ssl , gzip , proxy , upstream , and cache_purge .

32‑34. Web Load‑Balancing Architectures and Monitoring Overview of Nginx, HAProxy, Keepalived, LVS, and commands to view concurrent connections, adjust file descriptor limits, and sniff top IPs on port 80.

35‑40. Automation Scripts Bash scripts for IP reachability checks, log rotation, daily backups of /var/www/html using tar , and cron scheduling.

MonitoringOperationsLoad BalancingdevopsLinuxMySQLnetworking
Practical DevOps Architecture
Written by

Practical DevOps Architecture

Hands‑on DevOps operations using Docker, K8s, Jenkins, and Ansible—empowering ops professionals to grow together through sharing, discussion, knowledge consolidation, and continuous improvement.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.