Operations 27 min read

Mastering Enterprise CI/CD with Ansible: A Complete Hands‑On Guide

This comprehensive guide explores how Ansible can be used to build enterprise‑grade CI/CD automation platforms, covering the evolution of automation, core Ansible concepts, infrastructure setup, modular playbook design, CI/CD pipeline integration, advanced features like Vault and custom modules, real‑world case studies, best practices, and future trends.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Mastering Enterprise CI/CD with Ansible: A Complete Hands‑On Guide

Ansible Automation Revolution: Complete Practical Guide to Building Enterprise‑Level CI/CD Platforms

Introduction

In today’s fast‑moving digital era, automation is a key strategy for enterprises to improve efficiency, cut costs, and ensure service quality. Ansible, a leading automation tool, offers simple syntax, powerful features, and a rich ecosystem, redefining modern operations. This article delves into building an enterprise‑grade automation platform with Ansible, covering everything from foundational infrastructure to advanced features.

According to Red Hat’s 2024 Enterprise Automation State Report, organizations using Ansible reduce manual operations by 92%, improve deployment efficiency by 73%, and shorten incident recovery time by 68%.

Technical Background

Evolution of Automation

Automation has progressed through several stages:

1. Scripting Phase (2000‑2008)

Shell scripts, Python scripts, etc.

Lack of unified management and configuration standardization

2. Configuration Management Phase (2009‑2013)

Rise of tools like Puppet, Chef

Introduction of Infrastructure as Code

3. Cloud‑Native Automation Phase (2014‑2020)

Maturation of declarative tools such as Ansible and Terraform

Container orchestration and micro‑service automation

4. Intelligent Operations Phase (2021‑present)

Integration of AIOps with traditional automation

Self‑healing systems and predictive operations

Ansible Core Principles

Ansible achieves automation through three core technologies:

1. Agent‑less Architecture

# Ansible connects to target hosts via SSH
ansible all -m ping -i inventory.ini
# No additional software required on targets

2. Idempotency

# Example: idempotent configuration
-
name: Ensure nginx is installed and started
systemd:
  name: nginx
  state: started
  enabled: yes
# Re‑running yields the same result

3. Declarative Syntax

# YAML Playbook example
-
hosts: webservers
tasks:
  - name: Install nginx
    package:
      name: nginx
      state: present

Core Content

1. Building Ansible Infrastructure

1.1 Environment Preparation & Installation

Control node setup:

# CentOS/RHEL
sudo yum install epel-release
sudo yum install ansible

# Ubuntu/Debian
sudo apt update
sudo apt install ansible

# Install latest via pip
pip3 install ansible ansible-core

# Verify installation
ansible --version

Advanced configuration tuning:

# /etc/ansible/ansible.cfg
[defaults]
forks = 50
host_key_checking = False
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o UserKnownHostsFile=/dev/null
pipelining = True

gathering = smart
fact_caching = memory
fact_caching_timeout = 86400
log_path = /var/log/ansible.log
ansible_managed = Ansible managed: {file} modified on %Y-%m-%d %H:%M:%S by {uid} on {host}

[inventory]
enable_plugins = host_list, script, auto, yaml, ini, toml

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
control_path = /tmp/ansible-ssh-%h-%p-%r

1.2 Dynamic Inventory Management

Multi‑environment inventory configuration:

# inventory/group_vars/all.yml
---
ansible_user: ansible
ansible_ssh_private_key_file: ~/.ssh/ansible_key
timezone: Asia/Shanghai

environments:
  dev:
    domain: dev.company.com
  staging:
    domain: staging.company.com
  production:
    domain: company.com

Dynamic inventory script example:

#!/usr/bin/env python3
# inventory/dynamic_inventory.py
import json, requests
from argparse import ArgumentParser

class DynamicInventory:
    def __init__(self):
        self.inventory = {}
        self.read_cli_args()
        if self.args.list:
            self.inventory = self.get_inventory()
        elif self.args.host:
            self.inventory = self.get_host_info(self.args.host)
        print(json.dumps(self.inventory))

    def get_inventory(self):
        try:
            response = requests.get('http://cmdb.company.com/api/hosts')
            hosts_data = response.json()
            inventory = {'_meta': {'hostvars': {}}, 'webservers': {'hosts': []}, 'databases': {'hosts': []}, 'loadbalancers': {'hosts': []}}
            for host in hosts_data:
                group = host['role']
                if group in inventory:
                    inventory[group]['hosts'].append(host['hostname'])
                    inventory['_meta']['hostvars'][host['hostname']] = {
                        'ansible_host': host['ip_address'],
                        'environment': host['environment'],
                        'datacenter': host['datacenter']
                    }
            return inventory
        except Exception as e:
            return {'_meta': {'hostvars': {}}}

    def get_host_info(self, hostname):
        return {}

    def read_cli_args(self):
        parser = ArgumentParser()
        parser.add_argument('--list', action='store_true')
        parser.add_argument('--host', action='store')
        self.args = parser.parse_args()

if __name__ == '__main__':
    DynamicInventory()

2. Enterprise‑Level Playbook Design

2.1 Modular Playbook Architecture

Directory layout:

ansible-infrastructure/
├── inventories/
│   ├── production/
│   │   ├── hosts.yml
│   │   └── group_vars/
│   ├── staging/
│   └── development/
├── roles/
│   ├── common/
│   ├── nginx/
│   ├── mysql/
│   └── monitoring/
├── playbooks/
│   ├── site.yml
│   ├── webservers.yml
│   └── databases.yml
├── group_vars/
├── host_vars/
└── ansible.cfg

Main site playbook example:

# playbooks/site.yml
---
- name: General system configuration
  hosts: all
  become: yes
  roles:
    - common
    - security
    - monitoring-agent

- name: Web server configuration
  hosts: webservers
  become: yes
  roles:
    - nginx
    - php-fpm
    - ssl-certificates

- name: Database server configuration
  hosts: databases
  become: yes
  roles:
    - mysql
    - backup
    - performance-tuning

- name: Load balancer configuration
  hosts: loadbalancers
  become: yes
  roles:
    - haproxy
    - keepalived

2.2 Advanced Role Development

Nginx role tasks:

# roles/nginx/tasks/main.yml
---
- name: Install nginx
  package:
    name: nginx
    state: present
  notify: restart nginx

- name: Create nginx directories
  file:
    path: "{{ item }}"
    state: directory
    owner: root
    group: root
    mode: '0755'
  loop:
    - /etc/nginx/sites-available
    - /etc/nginx/sites-enabled
    - /var/log/nginx

- name: Deploy main nginx config
  template:
    src: nginx.conf.j2
    dest: /etc/nginx/nginx.conf
    backup: yes
  notify: reload nginx
  tags: config

- name: Deploy virtual hosts
  template:
    src: vhost.conf.j2
    dest: "/etc/nginx/sites-available/{{ item.name }}"
  loop: "{{ nginx_vhosts }}"
  notify: reload nginx
  tags: vhosts

- name: Enable virtual hosts
  file:
    src: "/etc/nginx/sites-available/{{ item.name }}"
    dest: "/etc/nginx/sites-enabled/{{ item.name }}"
    state: link
  loop: "{{ nginx_vhosts }}"
  when: item.enabled | default(true)
  notify: reload nginx

- name: Ensure nginx service is running
  systemd:
    name: nginx
    state: started
    enabled: yes

Variable defaults:

# roles/nginx/defaults/main.yml
---
nginx_user: www-data
nginx_worker_processes: auto
nginx_worker_connections: 1024
nginx_keepalive_timeout: 65
nginx_client_max_body_size: 64m

nginx_vhosts:
  - name: default
    listen: 80
    server_name: _
    root: /var/www/html
    index: index.html index.htm
    enabled: true

nginx_performance:
  sendfile: "on"
  tcp_nopush: "on"
  tcp_nodelay: "on"
  gzip: "on"
  gzip_vary: "on"
  gzip_comp_level: 6

3. CI/CD Integration & Automation Pipelines

3.1 GitLab CI Integration

GitLab CI configuration:

# .gitlab-ci.yml
stages:
  - validate
  - test
  - deploy-staging
  - deploy-production

variables:
  ANSIBLE_HOST_KEY_CHECKING: "False"
  ANSIBLE_FORCE_COLOR: "True"

validate-playbooks:
  stage: validate
  image: ansible/ansible-runner:latest
  script:
    - ansible-playbook --syntax-check playbooks/site.yml
    - ansible-lint playbooks/site.yml
  only:
    - merge_requests
    - master

test-roles:
  stage: test
  image: ansible/ansible-runner:latest
  script:
    - molecule test
  only:
    - merge_requests

deploy-staging:
  stage: deploy-staging
  image: ansible/ansible-runner:latest
  script:
    - ansible-playbook -i inventories/staging playbooks/site.yml --check --diff
    - ansible-playbook -i inventories/staging playbooks/site.yml
  environment:
    name: staging
  only:
    - master

deploy-production:
  stage: deploy-production
  image: ansible/ansible-runner:latest
  script:
    - ansible-playbook -i inventories/production playbooks/site.yml --check --diff
    - ansible-playbook -i inventories/production playbooks/site.yml
  environment:
    name: production
  when: manual
  only:
    - master

3.2 Blue‑Green Deployment Playbook

# playbooks/blue-green-deploy.yml
---
- name: Blue‑Green Deployment
  hosts: webservers
  serial: "{{ batch_size | default(1) }}"
  vars:
    current_color: "{{ ansible_local.deployment.color | default('blue') }}"
    new_color: "{{ 'green' if current_color == 'blue' else 'blue' }}"
  tasks:
    - name: Determine deployment path
      set_fact:
        deploy_path: "/opt/app/{{ new_color }}"
    - name: Create new version directory
      file:
        path: "{{ deploy_path }}"
        state: directory
    - name: Deploy new package
      unarchive:
        src: "{{ app_package_url }}"
        dest: "{{ deploy_path }}"
        remote_src: yes
    - name: Update configuration
      template:
        src: app.conf.j2
        dest: "{{ deploy_path }}/config/app.conf"
    - name: Health check new version
      uri:
        url: "http://{{ ansible_host }}:{{ app_port }}/health"
        method: GET
        timeout: 30
      register: health_check
      retries: 5
      delay: 10
    - name: Update load balancer upstream
      template:
        src: nginx-upstream.j2
        dest: /etc/nginx/conf.d/upstream.conf
      delegate_to: "{{ groups['loadbalancers'] }}"
      notify: reload nginx
    - name: Record deployment state
      copy:
        content: |
          [deployment]
          color={{ new_color }}
          version={{ app_version }}
          timestamp={{ ansible_date_time.epoch }}
        dest: /etc/ansible/facts.d/deployment.fact

4. Advanced Feature Applications

4.1 Vault Secure Management

Encrypting sensitive data:

# Create encrypted file
ansible-vault create group_vars/production/vault.yml

# Edit encrypted file
ansible-vault edit group_vars/production/vault.yml

# Encrypt existing file
ansible-vault encrypt inventories/production/secrets.yml

# Use encrypted variables in playbook
ansible-playbook -i inventories/production playbooks/site.yml --ask-vault-pass

Sample decrypted content:

# Vault variable definitions
vault_mysql_root_password: "SuperSecretPassword123!"
vault_api_key: "sk-1234567890abcdef"
vault_ssl_private_key: |
  -----BEGIN PRIVATE KEY-----
  MIIEvgIBADANBgkqhkiG9w0BAQEFAASCBKgwggSkAgEAAoIBAQC7...
  -----END PRIVATE KEY-----

4.2 Custom Module Development

# library/service_check.py
#!/usr/bin/python3
from ansible.module_utils.basic import AnsibleModule
import requests, time

def check_service_health(url, timeout=30, retries=3):
    """Check service health status"""
    for attempt in range(retries):
        try:
            response = requests.get(url, timeout=timeout)
            if response.status_code == 200:
                return True, f"Service is healthy (status: {response.status_code})"
        except requests.exceptions.RequestException as e:
            if attempt == retries - 1:
                return False, f"Service check failed: {str(e)}"
            time.sleep(5)
    return False, "Service health check failed after all retries"

def main():
    module = AnsibleModule(
        argument_spec=dict(
            url=dict(type='str', required=True),
            timeout=dict(type='int', default=30),
            retries=dict(type='int', default=3),
            expected_status=dict(type='int', default=200)
        ),
        supports_check_mode=True
    )
    url = module.params['url']
    timeout = module.params['timeout']
    retries = module.params['retries']
    is_healthy, message = check_service_health(url, timeout, retries)
    if is_healthy:
        module.exit_json(changed=False, msg=message, status="healthy")
    else:
        module.fail_json(msg=message, status="unhealthy")

if __name__ == '__main__':
    main()

Practical Cases

Case 1: Large‑Scale Internet Company Infrastructure Automation

Background: A company with over 3,000 servers across web, database, cache, and messaging services needed unified automation.

Solution Architecture:

Layered management architecture

# Environment layer configuration
environments:
  - name: production
    regions: [us-west-1, us-east-1, eu-west-1]
    security_level: high
  - name: staging
    regions: [us-west-1]
    security_level: medium
  - name: development
    regions: [us-west-1]
    security_level: low

Service discovery integration

# plugins/inventory/consul_inventory.py
import consul, json
class ConsulInventory:
    def __init__(self):
        self.consul = consul.Consul()
        self.inventory = {'_meta': {'hostvars': {}}}
    def get_inventory(self):
        services = self.consul.catalog.services()[1]
        for service_name in services:
            nodes = self.consul.catalog.service(service_name)[1]
            if service_name not in self.inventory:
                self.inventory[service_name] = {'hosts': []}
            for node in nodes:
                hostname = node['Node']
                self.inventory[service_name]['hosts'].append(hostname)
                self.inventory['_meta']['hostvars'][hostname] = {
                    'ansible_host': node['Address'],
                    'service_port': node['ServicePort'],
                    'datacenter': node['Datacenter']
                }
        return self.inventory

Automated deployment workflow

# playbooks/microservice-deploy.yml
---
- name: Microservice deployment
  hosts: "{{ service_name }}"
  serial: "{{ rolling_update_batch_size | default('25%') }}"
  max_fail_percentage: 10

  pre_tasks:
    - name: Remove node from load balancer
      uri:
        url: "http://{{ lb_host }}/api/v1/upstream/{{ service_name }}/remove"
        method: POST
        body_format: json
        body:
          server: "{{ ansible_host }}:{{ service_port }}"
        delegate_to: localhost

  tasks:
    - name: Stop old service
      systemd:
        name: "{{ service_name }}"
        state: stopped

    - name: Backup current version
      archive:
        path: "/opt/{{ service_name }}"
        dest: "/backup/{{ service_name }}-{{ ansible_date_time.epoch }}.tar.gz"

    - name: Deploy new version
      unarchive:
        src: "{{ artifact_url }}"
        dest: "/opt/{{ service_name }}"
        remote_src: yes
        owner: "{{ service_user }}"
        group: "{{ service_group }}"

    - name: Update configuration
      template:
        src: "{{ service_name }}.conf.j2"
        dest: "/opt/{{ service_name }}/config/app.conf"
        notify: restart service

    - name: Start service
      systemd:
        name: "{{ service_name }}"
        state: started
        enabled: yes

    - name: Health check
      uri:
        url: "http://{{ ansible_host }}:{{ service_port }}/health"
        register: health_result
        retries: 10
        delay: 30
        until: health_result.status == 200

  post_tasks:
    - name: Re‑add node to load balancer
      uri:
        url: "http://{{ lb_host }}/api/v1/upstream/{{ service_name }}/add"
        method: POST
        body_format: json
        body:
          server: "{{ ansible_host }}:{{ service_port }}"
        delegate_to: localhost

Results:

Deployment time reduced from 2 hours to 15 minutes

Success rate increased from 85 % to 99.5 %

Ops labor cost cut by 60 %

System availability rose to 99.99 %

Case 2: Financial Industry Compliance Automation

Background: A bank needed to meet strict PCI‑DSS, SOX, and other compliance standards.

Compliance automation solution:

Security baseline checks

# roles/security-compliance/tasks/main.yml
---
- name: Check SSH configuration compliance
  lineinfile:
    path: /etc/ssh/sshd_config
    regexp: "{{ item.regexp }}"
    line: "{{ item.line }}"
    state: present
  loop:
    - { regexp: '^Protocol', line: 'Protocol 2' }
    - { regexp: '^PermitRootLogin', line: 'PermitRootLogin no' }
    - { regexp: '^PasswordAuthentication', line: 'PasswordAuthentication no' }
    - { regexp: '^ClientAliveInterval', line: 'ClientAliveInterval 300' }
  notify: restart sshd
  tags: ssh-security

- name: Configure firewall rules
  firewalld:
    service: "{{ item }}"
    permanent: yes
    state: enabled
    immediate: yes
  loop:
    - ssh
    - https
  tags: firewall

- name: Disable unnecessary services
  systemd:
    name: "{{ item }}"
    state: stopped
    enabled: no
  loop:
    - telnet
    - rsh
    - rlogin
  ignore_errors: yes
  tags: disable-services

Compliance report generation

# playbooks/compliance-report.yml
---
- name: Generate compliance report
  hosts: all
  gather_facts: yes
  tasks:
    - name: Collect system information
      setup:
        gather_subset:
          - hardware
          - network
          - services
    - name: Check password policy
      shell: |
        grep -E '^PASS_MAX_DAYS|^PASS_MIN_DAYS|^PASS_WARN_AGE' /etc/login.defs
      register: password_policy
    - name: List regular user accounts
      shell: |
        awk -F: '($3 >= 1000) {print $1}' /etc/passwd
      register: user_accounts
    - name: Render compliance report
      template:
        src: compliance-report.j2
        dest: "/tmp/compliance-report-{{ ansible_hostname }}.html"
      delegate_to: localhost

Results:

Compliance check time reduced from one week to 2 hours

Issue remediation time cut by 80 %

Audit pass rate reached 100 %

Reduced compliance risk and potential fines

Best Practices

1. Performance Optimization Strategies

Concurrent execution tuning:

# ansible.cfg
[defaults]
forks = 100
host_key_checking = False
gathering = smart
fact_caching = memory
fact_caching_timeout = 86400

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s
control_path_dir = /tmp/.ansible-cp
pipelining = True

Asynchronous task handling:

# Long‑running backup task
- name: Run backup script asynchronously
  shell: |
    /opt/backup/backup-database.sh
  async: 3600
  poll: 0
  register: backup_job

- name: Check backup job status
  async_status:
    jid: "{{ backup_job.ansible_job_id }}"
  register: backup_result
  until: backup_result.finished
  retries: 60
  delay: 60

2. Error Handling & Rollback

# Deployment with rollback
- block:
    - name: Create deployment snapshot
      shell: |
        cp -r /opt/app /opt/app.backup.{{ ansible_date_time.epoch }}
    - name: Deploy new version
      unarchive:
        src: "{{ app_package }}"
        dest: /opt/app
    - name: Verify deployment
      uri:
        url: "http://localhost:8080/health"
        status_code: 200
        retries: 5
        delay: 10
  rescue:
    - name: Roll back to previous version
      shell: |
        rm -rf /opt/app
        mv /opt/app.backup.{{ ansible_date_time.epoch }} /opt/app
        systemctl restart app
    - name: Send alert notification
      mail:
        to: [email protected]
        subject: "Deployment Failed on {{ inventory_hostname }}"
        body: "Deployment failed and rolled back automatically"
  always:
    - name: Clean up temporary files
      file:
        path: "/tmp/deployment-{{ ansible_date_time.epoch }}"
        state: absent

3. Monitoring & Logging Integration

# roles/monitoring/tasks/main.yml
---
- name: Install monitoring agent
  package:
    name: node_exporter
    state: present

- name: Configure Prometheus service
  template:
    src: node_exporter.service.j2
    dest: /etc/systemd/system/node_exporter.service
  notify: restart node_exporter

- name: Push deployment metrics to Prometheus Pushgateway
  uri:
    url: "{{ prometheus_pushgateway_url }}"
    method: POST
    body: |
      ansible_deployment_total{job="ansible",instance="{{ inventory_hostname }}"} 1
      ansible_deployment_timestamp{job="ansible",instance="{{ inventory_hostname }}"} {{ ansible_date_time.epoch }}

Summary & Outlook

Ansible automation has become a cornerstone of modern IT infrastructure management. The analysis and case studies demonstrate that automation can boost efficiency five‑to‑tenfold, cut human errors by over 90 %, reduce operational costs by 50‑70 %, and raise system availability above 99.9 %.

Future trends:

AIOps integration – deeper AI/ML‑driven decision making

Cloud‑native optimization – better support for containers and micro‑services

Security automation – expanded scanning and compliance checks

Edge computing support – extending automation to edge devices and IoT

Implementation recommendations:

Establish standardized automation processes and guidelines

Invest in observability tools (monitoring, logging, tracing)

Prioritize security and compliance automation

Cultivate a DevOps culture and upskill teams

References

Official documentation:

Ansible official documentation

Ansible Galaxy

Ansible Molecule

Best‑practice guides:

Ansible best practices

Enterprise‑grade Ansible architecture design

Ansible security guide

Community resources:

Ansible Chinese community

Red Hat Ansible Automation Platform

Ansible GitHub repository

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ci/cdInfrastructure as CodeAnsible
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.