Operations 11 min read

Why IT Operations Must Embrace Automation: Benefits and Architecture

This article explains why IT operations must adopt automation, describing its definition, benefits such as zero‑delay response and fault prediction, essential operational components, the self‑built and open‑source infrastructure, and detailed automation frameworks for development, testing, release, monitoring, and service governance.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Why IT Operations Must Embrace Automation: Benefits and Architecture

Why IT Operations Need Automation

IT operations automation refers to converting repetitive manual tasks—ranging from simple health checks, configuration changes, and software installations to full change‑process orchestration—into automated workflows. This reduces or eliminates delays, enables zero‑latency response, and allows proactive fault prediction and alerting.

Core Operational Functions

Environment definition: development, testing, pre‑production, production.

Deployment: reliably push packages to various environments.

Monitoring: track system and application health after deployment.

Alerting: define response and handling mechanisms for issues.

Performance optimization: tune Nginx, Java, PHP, databases, network, etc.

SLA assurance: coordinate with business units to define service levels.

Service governance, task scheduling, cluster coordination, call‑chain analysis, interface quality, SQL quality, real‑time logs, etc. Packaging, automated testing, gray‑release, partitioned rollout, configuration standardization, command standardization. Distributed frameworks, storage & cache middleware, automated testing, cloud search, open platforms, marketing platforms.

Self‑Built Technical Infrastructure (Open‑Source + In‑House)

Automated release system – gray and partitioned releases.

Configuration automation – auto‑discovery and standardization.

Atomic command system – supports hundreds of servers and scripts.

Search platform – hundreds of indexes, billions of records.

Recommendation engine – handles hundreds of millions of user data calculations.

API automation & mock testing – supports API, web, and mobile testing.

API and SQL protection systems – prevent abusive calls.

Real‑time log system – aggregates Nginx, Tomcat, BI logs and offline tracing.

Distributed development framework – unified RPC communication.

Configuration distribution – service discovery across clusters.

Message middleware – push (IDP) and pull (Kafka) handling tens of millions of messages daily.

KV cache middleware – Memcached, Redis, Tair with high hit rates.

Distributed file middleware – MongoDB for files and images.

Database sharding middleware – MySQL with unlimited scaling.

Distributed task scheduler – supports 100+ services and hundreds of tasks per day.

Unified push platform – 1M+ daily pushes to Android, iOS, Email, SMS, WeChat, Comet.

Open‑Source Technology Stack

Languages: Java (Tomcat/Spring), Shell, Node.js, Android, iOS.

Distributed: ActiveMQ, Kafka, Zookeeper, Router service discovery, Cat.

Storage: MySQL, MongoDB, Tair, Memcached, Redis.

Compute: Solr, Elasticsearch, Hadoop, HBase, Storm, Spark.

Operations: Linux, Nginx, Puppet, Zabbix, OpenStack.

Project management: Eclipse, Git, Maven, Hudson CI, Confluence, DMS.

Development Stage – Code/Build

Web framework – Swift.

Node.js front‑end framework.

iOS and Android mobile frameworks.

Shell script automation.

Distributed middleware – RPC, real‑time comet push, IDP/L pull queues, Zookeeper, Scheduler.

Storage middleware – MySQL, MongoDB, Tair, Redis, Memcached.

Compute platforms – cloud search, recommendation, big‑data processing, web & text parsing, Word preview.

Testing Stage – Test/CI

API automated testing.

API mock testing.

Web automated testing with Selenium.

WeChat testing (WXTest).

Open testing (KATest).

Test environment release.

Release/Deploy Stage

Release system.

Operations system.

Code inspection Builder for operations.

Operations Monitoring System

Automation platform.

Monitoring with Zabbix.

Radar log system.

Puppet/Mco configuration management.

Service Governance

API water system – API quality governance.

SQL water system – SQL quality governance.

Router service center.

Configuration distribution system.

Scheduler system.

Call‑chain analysis with Cat.

Open platforms – WeChat, Weibo, telephone, payment, API, SEO.

Channel platforms – push, SMS, email, WeChat, private messaging.

1. Distributed Service Architecture

Service discovery, communication, and control are handled by a distributed registration center (Router) providing synchronous RPC, HTTP/heartbeat protocols, unified configuration files, load balancing, asynchronous MQ calls, push and pull modes, and task scheduling.

2. Operations R&D Automation System

Standardization occurs in three layers:

Hardware standardization : uniform data center, rack, switch, machine specs; standardized IP/DNS; automated hardware configuration collection.

Software standardization : consistent installation of Tomcat, JDK, Memcached, Redis, etc.; Nginx domain and configuration standardization.

Project standardization : standardized project zones (S, A, B, C) supporting multiple tech stacks (Tomcat, Java, Node.js, Python, iOS/Android).

3. Project Release Automation System

Code release – gray and partitioned (lane) releases.

Configuration release – publishing configuration data and coordinating clusters (Solr, Kafka).

Atomic commands – system‑level operations with logging.

4. Service Governance System

Health status detection.

Distributed task scheduling (Schedule).

Call‑chain analysis (Cat).

Real‑time log monitoring (Radar system).

API quality governance (APIWater).

SQL quality governance (Monyog).

5. Automated Test Environment Construction

Automated building of test environments to support continuous integration.

6. Automated Testing

API automated testing.

Web automated testing with Selenium.

Mock simulation testing.

Source: http://www.cnblogs.com/wintersun/p/5059097.html

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

monitoringIT Operations
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.