Mastering Modern IT Operations: Roles, Practices, and Evolution
This article outlines the comprehensive responsibilities and evolution of IT operations, covering system, application, database, security, and platform management, detailing tasks such as infrastructure building, monitoring, optimization, automation, and the shift from manual processes to self‑scheduling systems.
Internet operations work is service‑oriented, focusing on stability, security, and efficiency to ensure 24/7 high‑quality service for users.
Operations staff strengthen the stability of the underlying infrastructure, services, and online applications, conduct daily inspections to identify potential risks, and optimize architecture to mitigate common failures while improving disaster‑recovery capabilities through multi‑data‑center integration.
Through monitoring, log analysis, and other technical means, they promptly detect and respond to service faults, reducing downtime and meeting availability expectations.
Security responsibilities include protecting all layers of the service stack, from network boundaries and ACL management to OS and open‑source vulnerability patching, as well as defending against XSS, SQL injection, and DDoS attacks, and conducting code scanning, permission audits, and intrusion detection.
Operations must ensure that internet services run in a secure, controllable state, safeguarding business and user privacy data while resisting malicious attacks.
In addition to stability and security, operations optimize efficiency by improving I/O performance, compressing images to reduce bandwidth, and enhancing internal release and delivery tools.
Work Classification of Operations
As businesses grow, mature internet companies subdivide operations roles into more specialized areas.
The typical classification (see image) includes:
System Operations
Responsible for IDC, network, CDN, and basic services (LVS, NTP, DNS); asset management; server selection, delivery, and maintenance.
1. IDC Data Center Construction
Collect business requirements, estimate future scale, and evaluate factors such as backbone network distribution, building design, Internet access, attack defense, expansion capacity, space reservation, dedicated lines, and on‑site support to select and build data centers.
2. Network Construction
Design and plan production network architecture, including data‑center, transport, and CDN networks, and perform daily network tuning.
3. LVS Load Balancing and SNAT
Deploy LVS clusters as traffic entry points and provide high‑performance, high‑availability load distribution and unified network‑level attack protection. SNAT offers centralized public‑network access with high performance and availability.
4. CDN Planning and Construction
Manage third‑party and self‑built CDN, select and schedule third‑party CDN, plan new CDN nodes, and ensure CDN stability and efficiency.
5. Server Selection, Delivery, and Maintenance
Test and select servers, reduce power consumption, increase rack density, and develop hardware fault diagnosis and monitoring tools.
6. OS and Kernel Selection & Maintenance
Select and customize OS and kernel, manage patches, maintain YUM repositories, and provide targeted optimization for different services.
7. Asset Management
Record and manage physical resources (data centers, networks, racks, servers, ACLs, IPs), ensure accurate information, and provide APIs for automation.
8. Basic Service Construction
Design highly available architectures for DNS, NTP, SYSLOG and other essential services to avoid single points of failure.
Application Operations
Handles online service changes, status monitoring, disaster recovery, data backup, routine inspections, and incident response.
1. Design Review
Participate in product design reviews to ensure high‑availability requirements are met from an operations perspective.
2. Service Management
Define upgrade, rollback plans, understand service dependencies, set stability metrics, improve monitoring, and respond to incidents promptly.
3. Resource Management
Manage server assets, assess data‑center distribution, network bandwidth, and allocate resources efficiently.
4. Routine Inspection
Establish and continuously improve inspection points, track and resolve hidden risks.
5. Plan Management
Set thresholds for monitoring indicators and define response procedures, maintaining up‑to‑date runbooks and conducting regular drills.
6. Data Backup
Develop backup strategies, ensure data availability and integrity, and perform regular restore tests.
Database Operations
Focuses on storage design, schema, index, SQL optimization, monitoring, backup, high availability, and automation.
1. Design Review
Provide DBA perspective on storage solutions, schema design, SQL standards, and indexing during product development.
2. Capacity Planning
Monitor database capacity limits, identify bottlenecks, and perform optimization or scaling before limits are reached.
3. Backup & Disaster Recovery
Define backup and DR strategies, conduct regular recovery tests.
4. Database Monitoring
Implement health and performance monitoring, detect faults early.
5. Database Security
Establish account management, enforce least‑privilege, protect offline backups, and prevent data leakage.
6. High Availability & Performance Optimization
Design failover solutions, introduce new storage, hardware, filesystem, and SQL optimizations while controlling costs.
7. Automation System Development
Build automated deployment, scaling, sharding, permission management, backup/recovery, and SQL review tools.
Operations Security
Handles network, system, and application security hardening, regular scanning, penetration testing, tool development, and incident response.
1. Security Policy Creation
Develop practical security policies aligned with internal processes.
2. Security Training
Provide targeted training and assessments, establish security responsibility roles.
3. Risk Assessment
Conduct black‑box and white‑box testing to evaluate risks across network, servers, applications, and data.
4. Security Implementation
Strengthen weak points, deploy security devices, patch promptly, perform code scanning, and apply encryption, anonymization, or data deletion techniques.
5. Compliance
Meet regulatory requirements such as payment licensing.
6. Emergency Response
Establish alert systems, collect third‑party findings, coordinate remediation, and perform post‑incident analysis.
Evolution of Operations Work
Early stages involved manual data‑center construction and server provisioning with minimal online service management.
As services matured, responsibilities expanded to include server monitoring, LVS/Nginx management, and basic scripting.
Later, operations split into system and application teams; application teams took over service monitoring, backup, and change management, developing tools for batch operations.
Increasing scale introduced multi‑data‑center disaster recovery, pre‑plan management, and the need for more sophisticated monitoring beyond simple server metrics.
Security incidents drove deeper investment in protection measures, leading to five major classification areas.
System operations now focus on infrastructure stability and efficiency, while application operations concentrate on service performance and reliability.
Database operations specialize in automation, performance tuning, and security; operations development builds platforms and tools; operations security ensures comprehensive protection.
The development process is divided into four stages (see image): manual management, batch‑tool operation, platform management, and self‑scheduling automation.
Ultimately, the goal is to automate all repetitive tasks, reduce knowledge transfer costs, and shift fault handling from reactive to proactive, making operations delivery more efficient, secure, and stable.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
