Operations 9 min read

How to Build a Scalable Python‑Based Operations Automation Platform

This article explores the design and implementation of a Python‑driven, extensible operations automation platform, covering its motivations, architecture, module customization, security auditing, client‑server structure, and future enhancements for robust DevOps workflows.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
How to Build a Scalable Python‑Based Operations Automation Platform

Introduction

Today we discuss how to build a scalable operations automation platform based on Python, aiming to share knowledge and grow together.

Why Choose Python?

Pre‑installed and cross‑platform.

Readable syntax and high development efficiency.

Rich third‑party libraries (frameworks, APIs, scientific computing, GUI, etc.).

Active community and many developers.

Python ranks third among high‑level languages at Tencent, widely used in system operations, business logic, operational platforms, testing tools, and data mining. The well‑known "Blue Whale" PaaS platform is built on Python.

1. Platform Overview

OMServer is a centralized Linux cluster management platform offering business cluster management, real‑time security auditing, modular customization, encrypted data transmission, support for mainstream Python components, and a user‑friendly experience.

Key Third‑Party Libraries

Django – MVC web framework written in Python.

rpyc – RPC and distributed computing tool supporting sync/async operations.

saltstack, ansible, func – Python‑based automation configuration and workflow components.

MySQL – Popular relational database management system.

2. Platform Architecture Design

Three‑Tier Architecture

The platform consists of a Web interaction layer, a distributed computing layer, and a cluster management service layer.

Web Interaction Layer : B/S architecture for administrators, built with Django.

Distributed Computing Layer : Provides connection channels to the master node using the rpyc protocol.

Cluster Management Service Layer : Integrates Python remote‑operation components (Saltstack, Ansible, Func) to manage business server clusters, supporting multi‑site redundancy and high execution efficiency.

Operation flow: Front‑end parameters → encrypted transmission → task execution → result set → decryption output.

Architecture Advantages

Multi‑machine management across different IDC zones. High security with encrypted transmission and private TCP protocol. Supports various client access methods (Web, desktop, mobile). Leverages advanced features of Python components (Playbook, State). Strong extensibility and modular customization.

Operation Process Diagram

(Image illustrating the three‑tier interaction flow.)

Remote‑Operation Component Integration

Configure trust relationship (certificate or SSH) between master and managed nodes.

Use OMServer’s packaged task modules and API to dispatch and execute customized tasks.

3. Platform Module Customization

Task Module Design

Task modules represent atomic operations such as reloading configurations, deploying cache services, or stopping Nginx.

Steps to add a module:

Define input parameters using HTML form elements (text, dropdown, checkboxes, etc.).

Write backend code, typically invoking Saltstack or Ansible client APIs.

Core code can be as short as five lines to execute a shell script on the target.

Running a module involves selecting the task, specifying parameters, executing, and receiving results.

Key Operational Focus Areas

Platform feature improvement and upgrades requiring DevOps capabilities.

Developing task modules based on business needs.

Standardizing and streamlining daily workflows.

System and business performance tuning.

4. Security Auditing Implementation

Audit Architecture

The audit consists of a front‑end display of operation events and a server‑side agent that reports data via CGI to a database, enabling keyword monitoring and alert triggering.

Agent Reporting Mechanism

By modifying /etc/profile to capture history events, the agent logs all user commands and sends them via HTTP GET to the database using OMAudit_agent.py. The front‑end refreshes periodically to show the latest events.

5. C/S Structure Implementation

OManager Desktop

(Images of the desktop client and its architecture.)

Future Optimizations

Integrate advanced Ansible or Saltstack features such as playbooks.

Package multiple task modules into templates for combined operation and change‑management workflows.

Introduce Celery for higher concurrency.

Add pause, abort, and retry capabilities to task queues.

Connect with CMDB for broader applicability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

PythonDevOpsplatform architectureOperations AutomationSecurity AuditingModule Customization
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.