Operations 25 min read

How Automated Operations Transform Enterprise IT: Trends, Tools, and Best Practices

This article examines the current state and future trends of enterprise operations, outlines common challenges and requirements, explains the importance of standardizing processes and management policies, compares leading open‑source automation tools, and provides a practical SaltStack deployment guide for building an automated operations platform.

Efficient Ops

Oct 15, 2018

How Automated Operations Transform Enterprise IT: Trends, Tools, and Best Practices

1. Current State and Development Trends of Enterprise Operations

As enterprise digitalization advances, operations staff face increasingly complex services and diverse user demands; expanding applications require flexible, secure, and stable operational models.

From a few servers to a massive data center, manual methods can no longer meet technical, business, and management needs, making standardization, automation, architecture optimization, and process improvement essential for reducing operational costs.

Automation is gradually replacing manual tasks, offering powerful advantages in enterprise operations.

Beyond merely substituting human work, automation now enables deep insight and global analysis, aiming to optimize performance and service while maximizing investment returns.

Through automated operations, organizations can achieve operational goals with less downtime and higher service quality.

Consequently, shifting from manual to automated management is a key development trend for increasingly complex operations.

2. Problems and Requirements in Enterprise Operations

Initially, the company relied on a few servers for file sharing and email, with operations performed entirely by hand. As new business systems were launched and a central data center was built, operations remained manual, though network management and environment monitoring systems introduced semi‑automation.

With growing business, operational workload continuously increases, leading to the following issues and needs:

2.1 Improving Efficiency and Proactiveness of Operations Staff

Operations often react only after a fault occurs and impacts services, resulting in a “fire‑fighting” mode that hampers quality and lowers satisfaction for both IT and business departments.

Most daily effort goes to handling repetitive, simple issues, and because fault‑alert mechanisms are weak, problems are addressed only after they arise, keeping staff in a passive state. The goal is to detect and resolve faults before they affect services.

2.2 Establishing an Efficient Operations Mechanism

Lack of an automated operations model, unclear role definitions, and undefined responsibilities make root‑cause identification slow and inaccurate, preventing timely remediation.

Furthermore, the absence of a standardized fault‑handling workflow leads to ad‑hoc solutions and insufficient tracking. A robust operations management system is needed to provide direction and a solid basis for work.

2.3 Insufficient Operations Technical Tools

Complex business systems, diverse network devices, servers, storage, and applications overwhelm operations staff, causing frequent service interruptions despite overtime efforts.

These problems stem partly from a shortage of monitoring and diagnostic tools; without efficient technical support, fault events cannot be handled proactively or quickly.

3. Standardizing Business Processes and Strengthening Operations Management

3.1 Achieving Business Process Standardization as a Foundation for Automation

Standardization is the basis of automated operations. First, identify all operational objects; every operation should target a specific object.

If operations are detached from objects, they lose meaning. For example, scaling a server differs from scaling an application; mixing these leads to chaos and higher communication costs.

Physical infrastructure standardization includes identifying servers, switches, cabinets, their attributes (serial numbers, IPs, vendors) and relationships (which cabinet a server resides in, which switch port it connects to).

Application standardization covers services, middleware, databases, tables, views, stored procedures, field names, indexes, and relationships.

Process standardization involves backup, software upgrades, antivirus, new‑service onboarding, etc.

Automated operations link events to predefined IT processes; when monitoring detects performance breaches or outages, the corresponding workflow is triggered automatically, enabling fault response and recovery.

The automation platform also handles repetitive daily tasks, boosting efficiency and moving toward “zero‑latency” operations.

3.2 Building a Complete Operations Management System

The system includes environment, asset, media, device, monitoring, network security, system security, malware prevention, password, change, backup & recovery, security incident handling, and emergency plan management.

Well‑defined policies improve efficiency; staff follow documented procedures for fast, accurate actions.

Comprehensive policies enable early detection of issues before loss occurs, ensuring business continuity.

Standardized processes help quickly locate root causes, minimizing business impact.

Policies must evolve with business growth, fostering continuous improvement.

4. Selecting an Automated Operations Technology Stack

4.1 Overview of Automated Operations

Automation covers installation, deployment, monitoring, release, upgrade, security control, optimization, and data backup.

Solutions include commercial, open‑source, and self‑built systems.

Commercial products offer comprehensive features, strong support, and guaranteed updates, but at higher cost and lower technical demands on staff.

Open‑source tools are more flexible, require more effort from staff, and have lower cost.

Self‑built systems demand the highest technical expertise and cost, yet suit large‑scale enterprises seeking tailored capabilities.

4.2 Open‑Source Operations Tools: Scenarios and Advantages

Puppet : Powerful configuration and deployment tool; easy to use, web UI for reports, but complex DSL/Ruby learning curve.

SaltStack : Fast, scalable infrastructure management; simple configuration modules, strong web UI, but lacks deep reporting.

Ansible : Python‑based, agent‑less, supports any language modules, simple installation, but Windows support and execution speed are weaker.

Monitoring tools:

Nagios : Free, flexible IT infrastructure monitor for Windows, Linux, VMware, network devices; alerts via email/SMS; limited event console and historical data.

Zabbix : Enterprise‑grade, web‑based distributed monitoring; C backend, PHP frontend; strong visualization and API, but complex custom development and alert configuration.

4.3 Implementing Server Deployment Automation with SaltStack

SaltStack is a Python‑based C/S configuration management tool using ZeroMQ and SSL certificates for secure communication.

Version 0.16.0 introduces multi‑master support; minions can connect to multiple masters, ensuring high availability.

Deployment steps:

Wget http://download.fedoraproject.org/pub/epel/6/i386/epel-release-6-8.noarch.rpm
yum install salt-master
yum install salt-minion
# Installation completes with "Complete"

Configure master backup nodes and copy keys:

Master:
-saltmaster1.cccxht.com
-saltmaster2.cccxht.com

Master private key resides in /etc/salt/pki/master. Copy master.pem and master.pub to backup masters, enable them, and accept keys.

Restart minions so they recognize both masters.

Note: Minions automatically detect failed masters and reconnect to a healthier one when master_alive_interval is set to true.

Write SaltStack state files for modular, reusable configurations; test extensively before large‑scale deployment.

Common commands:

[root@centos salt]# salt '*' test.ping
localhost:
 True
server.cccxht.com:
 True

[root@centos /]# salt 'localhost' network.interfaces
localhost:
    eth0:
        hwaddr: 08:00:27:59:a9:8d
        inet:
            - address: 192.168.151.202
            - broadcast: 192.168.151.255
            - label: eth0
            - netmask: 255.255.255.0

[root@centos tmp]# salt 'localhost' disk.usage
localhost:
    /:
        1K-blocks: 28423128
        available: 21572236
        capacity: 25%
        filesystem: /dev/mapper/vg_centos-lv_root
        used: 5406132

SaltStack integrates with Zabbix for event‑driven automation, supports cloud platforms via salt-cloud, and can be combined with CMDB for a fully automated operations platform.

5. Designing an Automated Operations Solution

5.1 Planning Diagram

The solution follows ITIL principles, building a layered platform where low‑level service tools expose APIs to higher‑level business services.

5.2 Platform Module Design

Key modules include:

Incident Management : Record, classify, assign, and supervise incident resolution to meet SLA targets.

Problem & Log Management : Analyze root causes, document solutions, and prevent recurrence.

Change Management : Control and execute infrastructure or service changes with minimal business impact.

Feasibility Management : Align IT architecture with business needs while controlling costs.

Emergency Event Handling : Standardize response to sudden incidents through monitoring, spare parts, and contingency plans.

The platform orchestrates these modules, providing a business‑driven scheduling layer that coordinates underlying subsystems.

6. Summary of Enterprise Automated Operations

Enterprise operations have evolved from fully manual to partially automated, and now to comprehensive automation.

Before the platform, new services required extensive manual steps (DNS, LVS, OS init, testing, deployment, monitoring, configuration, etc.). After automation, a simple configuration triggers the platform to handle the rest.

Post‑implementation metrics show user satisfaction rising from 33% to 95% and IT cost‑to‑revenue ratio dropping from 4% to 2.4%.

The platform provides a clear view of IT assets, performance, reliability, and availability, supporting strategic decision‑making.

Automation reduces manual fault handling, shortens response times, discovers potential issues early, and enables rapid recovery through configuration snapshots.

Source: Reprinted from talkwithtrend, author Nie Kuijia.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring automation open-source tools IT Operations ITIL

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.