Cloud Computing 17 min read

Master Multi‑Cloud Orchestration with Terraform + Ansible: IaC Best Practices

This article explains how to combine Terraform and Ansible to build a robust, enterprise‑grade multi‑cloud resource orchestration workflow, covering their strengths, limitations, layered architecture, real‑world e‑commerce deployment, CI/CD integration, advanced tips, cost optimization, and security best practices.

MaGe Linux Operations
MaGe Linux Operations
MaGe Linux Operations
Master Multi‑Cloud Orchestration with Terraform + Ansible: IaC Best Practices

Terraform+Ansible Dual Sword: Best Practices for Multi‑Cloud Resource Orchestration in the IaC Era

In the wave of cloud native, manual operations can no longer meet digital transformation needs. As a veteran ops engineer, I share how to combine Terraform and Ansible for enterprise‑grade multi‑cloud orchestration.

Pain Points: Why Single Tools Aren’t Enough?

Terraform Strengths and Limitations

Terraform excels at declarative IaC resource provisioning:

State Management : tfstate file tracks resource changes precisely.

Dependency Resolution : automatically builds a dependency graph to ensure correct creation order.

Multi‑Cloud Support : provider ecosystem covers major cloud vendors.

However, in real projects Terraform shows clear shortcomings:

# Terraform excels at creating infrastructure
resource "aws_instance" "web" {
  ami           = "ami-0c55b159cbfafe1d0"
  instance_type = "t3.medium"

  # Complex configuration management is weak
  user_data = <<-EOF
    #!/bin/bash
    yum update -y
    # Lots of scripts piled up, hard to maintain
  EOF
}

Ansible Configuration Management Advantages

Ansible stands out in configuration management and application deployment:

Idempotent Operations : repeated runs produce no side effects.

Rich Module Library : covers system, network, cloud services, etc.

Dynamic Inventory : flexibly adapts to dynamic infrastructure.

Nevertheless, Ansible lacks robust state management for infrastructure provisioning.

Architecture Design: Building a Collaborative System

Based on years of practice, I designed a “layered decoupling” architecture:

┌─────────────────────────────────────────┐
│            GitOps Workflow               │
├─────────────────────────────────────────┤
│ Terraform Layer (Infrastructure Supply) │
│  ├── Network (VPC/Subnet/Security Group) │
│  ├── Compute (EC2/ECS/Lambda)          │
│  └── Storage (S3/RDS/ElastiCache)      │
├─────────────────────────────────────────┤
│ Ansible Layer (Configuration Management)│
│  ├── System Config (users/permissions) │
│  ├── App Deployment (containers)      │
│  └── Monitoring (logs/alerts/backup) │
└─────────────────────────────────────────┘

Hands‑On Demo: Multi‑Cloud E‑Commerce Deployment

Example: Deploy a cross‑AWS and Alibaba Cloud e‑commerce platform.

Step 1 – Define Infrastructure with Terraform

# main.tf – Multi‑Cloud infrastructure definition
terraform {
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
    alicloud = {
      source  = "aliyun/alicloud"
      version = "~> 1.200"
    }
  }

  backend "s3" {
    bucket = "terraform-state-prod"
    key    = "ecommerce/infrastructure.tfstate"
    region = "us-west-2"
  }
}

module "aws_infrastructure" {
  source = "./modules/aws"
  vpc_cidr = "10.0.0.0/16"
  availability_zones = ["us-west-2a","us-west-2b","us-west-2c"]
  enable_ansible_inventory = true
}

module "alicloud_infrastructure" {
  source = "./modules/alicloud"
  vpc_cidr = "172.16.0.0/16"
  zones = ["cn-hangzhou-g","cn-hangzhou-h"]
  enable_ansible_inventory = true
}

resource "local_file" "ansible_inventory" {
  content  = templatefile("${path.module}/templates/inventory.tpl", {
    aws_instances = module.aws_infrastructure.instance_ips
    ali_instances = module.alicloud_infrastructure.instance_ips
    rds_endpoints = module.aws_infrastructure.rds_endpoints
  })
  filename = "../ansible/inventory/terraform.ini"
}

Step 2 – Fine‑Grained Configuration with Ansible

# playbooks/site.yml – Main orchestration
---
- name: E‑Commerce Platform Deployment
  hosts: localhost
  gather_facts: false
  vars:
    deployment_env: "{{ env | default('production') }}"

  tasks:
    - name: Prepare base environment
      include_tasks: tasks/infrastructure_check.yml

    - name: Deploy application services
      include_tasks: tasks/application_deploy.yml

# tasks/infrastructure_check.yml
---
- name: Verify Terraform output
  block:
    - name: Check instance reachability
      wait_for:
        host: "{{ item }}"
        port: 22
        timeout: 300
      loop: "{{ groups['web_servers'] }}"

# tasks/application_deploy.yml
---
- name: Deploy containerized application
  block:
    - name: Configure Docker
      include_role:
        name: docker
      vars:
        docker_compose_version: "2.20.0"

    - name: Deploy microservice stack
      docker_compose:
        project_src: "{{ app_path }}/docker-compose"
        definition:
          version: '3.8'
          services:
            frontend:
              image: "{{ ecr_registry }}/ecommerce-frontend:{{ app_version }}"
              ports: ["80:3000"]
              environment:
                API_ENDPOINT: "{{ api_gateway_url }}"
            backend:
              image: "{{ ecr_registry }}/ecommerce-backend:{{ app_version }}"
              environment:
                DATABASE_URL: "{{ database_connection_string }}"
                REDIS_URL: "{{ redis_cluster_endpoint }}"

Step 3 – CI/CD Pipeline Integration

# .github/workflows/deploy.yml
name: Multi‑Cloud Deployment Pipeline

on:
  push:
    branches: [ main ]
    paths: ['infrastructure/**', 'ansible/**']

jobs:
  terraform:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.5.0
      - name: Terraform Plan
        run: |
          cd infrastructure
          terraform init
          terraform plan -var-file="vars/${ENVIRONMENT}.tfvars"
      - name: Terraform Apply
        if: github.ref == 'refs/heads/main'
        run: |
          terraform apply -auto-approve -var-file="vars/${ENVIRONMENT}.tfvars"

  ansible:
    needs: terraform
    runs-on: ubuntu-latest
    steps:
      - name: Execute Ansible Playbook
        run: |
          cd ansible
          ansible-playbook -i inventory/terraform.ini site.yml \
            --extra-vars "env=${ENVIRONMENT}" \
            --vault-password-file .vault_pass

Advanced Tips: Smoother Collaboration

1. State Sharing Mechanism

Pass Terraform output variables to Ansible:

# outputs.tf
output "ansible_vars" {
  value = {
    database_endpoint = aws_rds_cluster.main.endpoint
    redis_cluster_config = aws_elasticache_replication_group.main.configuration_endpoint_address
    load_balancer_dns = aws_lb.main.dns_name
    security_groups = {
      web = aws_security_group.web.id
      db  = aws_security_group.db.id
    }
  }
  sensitive = false
}

2. Dynamic Inventory Management

# inventory/terraform_inventory.py
import json, subprocess, sys

def get_terraform_output():
    try:
        result = subprocess.run(['terraform','output','-json'],
                               capture_output=True, text=True,
                               cwd='../infrastructure')
        return json.loads(result.stdout)
    except Exception as e:
        print(f"Error getting terraform output: {e}", file=sys.stderr)
        return {}

def generate_inventory():
    tf_output = get_terraform_output()
    inventory = {
        '_meta': {'hostvars': {}},
        'all': {'children': ['aws','alicloud']},
        'aws': {'children': ['web_servers','db_servers'],
                'vars': {'ansible_ssh_common_args':'-o StrictHostKeyChecking=no',
                         'cloud_provider':'aws'}},
        'web_servers': {'hosts': []},
        'db_servers': {'hosts': []}
    }
    if 'instance_ips' in tf_output:
        for ip in tf_output['instance_ips']['value']:
            inventory['web_servers']['hosts'].append(ip)
            inventory['_meta']['hostvars'][ip] = {
                'ansible_host': ip,
                'ansible_user': 'ec2-user',
                'instance_type': 't3.medium'
            }
    return inventory

if __name__ == '__main__':
    print(json.dumps(generate_inventory(), indent=2))

3. Error Handling and Rollback Strategy

# playbooks/rollback.yml – Smart rollback
---
- name: Application Deployment Rollback
  hosts: web_servers
  serial: "{{ rollback_batch_size | default(1) }}"
  max_fail_percentage: 10

  vars:
    health_check_retries: 5
    health_check_delay: 30

  pre_tasks:
    - name: Create rollback snapshot
      block:
        - name: Backup current config
          archive:
            path: "{{ app_path }}"
            dest: "/backup/app-{{ ansible_date_time.epoch }}.tar.gz"
        - name: Record current version
          copy:
            content: "{{ current_version }}"
            dest: "/backup/current_version"

  tasks:
    - name: Execute version rollback
      block:
        - name: Stop current service
          systemd:
            name: "{{ app_service_name }}"
            state: stopped

        - name: Deploy historic version
          unarchive:
            src: "{{ rollback_package_url }}"
            dest: "{{ app_path }}"
            remote_src: yes

        - name: Start service
          systemd:
            name: "{{ app_service_name }}"
            state: started
            enabled: yes

  rescue:
    - name: Rollback failure handling
      fail:
        msg: "Rollback failed, manual intervention required"

  post_tasks:
    - name: Health check
      uri:
        url: "http://{{ ansible_host }}:{{ app_port }}/health"
        method: GET
        status_code: 200
        retries: "{{ health_check_retries }}"
        delay: "{{ health_check_delay }}"

Monitoring and Observability Integration

# roles/monitoring/tasks/main.yml
---
- name: Deploy monitoring stack
  block:
    - name: Prometheus configuration
      template:
        src: prometheus.yml.j2
        dest: /etc/prometheus/prometheus.yml
        vars:
          terraform_targets: "{{ terraform_monitoring_targets }}"
      notify: restart prometheus

    - name: Grafana dashboards
      grafana_dashboard:
        grafana_url: "{{ grafana_endpoint }}"
        grafana_api_key: "{{ grafana_api_key }}"
        dashboard: "{{ item }}"
      loop:
        - infrastructure-overview
        - application-metrics
        - multi-cloud-cost-analysis

    - name: Alert rule configuration
      template:
        src: alert-rules.yml.j2
        dest: /etc/prometheus/rules/infrastructure.yml
        vars:
          notification_webhook: "{{ slack_webhook_url }}"

Cost Optimization Strategies

Automate cost control with scheduled scaling and spot‑instance policies:

# modules/cost-optimization/main.tf
resource "aws_autoscaling_schedule" "scale_down" {
  scheduled_action_name  = "scale-down-evening"
  min_size               = 1
  max_size               = 2
  desired_capacity       = 1
  recurrence             = "0 18 * * MON-FRI"
  autoscaling_group_name = aws_autoscaling_group.web.name
}

resource "aws_autoscaling_schedule" "scale_up" {
  scheduled_action_name  = "scale-up-morning"
  min_size               = 2
  max_size               = 10
  desired_capacity       = 3
  recurrence             = "0 8 * * MON-FRI"
  autoscaling_group_name = aws_autoscaling_group.web.name
}

resource "aws_autoscaling_group" "web" {
  mixed_instances_policy {
    instances_distribution {
      on_demand_percentage = 20
      spot_allocation_strategy = "diversified"
    }

    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.web.id
        version = "$Latest"
      }

      override {
        instance_type = "t3.medium"
        weighted_capacity = "1"
      }

      override {
        instance_type = "t3.large"
        weighted_capacity = "2"
      }
    }
  }
}

Security Best Practices

1. Key Management

# playbooks/security-hardening.yml
---
- name: Security hardening configuration
  hosts: all
  become: yes

  vars:
    vault_secrets: "{{ vault_aws_secrets }}"

  tasks:
    - name: Retrieve DB password from SSM
      aws_ssm_parameter_store:
        name: "/{{ environment }}/database/password"
        region: "{{ aws_region }}"
      register: db_password
      no_log: true

    - name: Write secrets to Vault
      hashivault_write:
        mount_point: secret
        secret: "{{ app_name }}/{{ environment }}"
        data:
          database_url: "{{ vault_secrets.database_url }}"
          api_keys: "{{ vault_secrets.api_keys }}"

2. Network Security

# Zero‑Trust security group
resource "aws_security_group" "web_tier" {
  name_prefix = "web-tier-"
  vpc_id      = aws_vpc.main.id

  ingress {
    from_port       = 80
    to_port         = 80
    protocol        = "tcp"
    security_groups = [aws_security_group.alb.id]
  }

  egress {
    from_port   = 443
    to_port     = 443
    protocol    = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }

  tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

Enterprise‑Level Best‑Practice Summary

Key takeaways:

Terraform for infrastructure : lifecycle of network, compute, storage.

Ansible for configuration : system setup, app deployment, ops automation.

Clear division of responsibilities : avoid overlap, keep architecture clean.

Organize code with separate infrastructure/ and ansible/ directories, use environment‑specific modules, and adopt semantic versioning with dedicated state files for each environment.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

multi-cloudInfrastructure AutomationiacTerraformAnsible
MaGe Linux Operations
Written by

MaGe Linux Operations

Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.