Operations 17 min read

How to Combine Terraform and Ansible for Seamless Multi‑Cloud Orchestration

This guide explains why single‑tool approaches fall short in modern IaC, compares Terraform’s state management and multi‑cloud support with Ansible’s configuration capabilities, and provides a step‑by‑step architecture, code samples, CI/CD integration, monitoring, cost‑saving, and security practices for enterprise‑grade deployments.

Raymond Ops
Raymond Ops
Raymond Ops
How to Combine Terraform and Ansible for Seamless Multi‑Cloud Orchestration

In the era of cloud‑native transformation, manual operations can no longer keep pace with business demands. The author, an experienced site reliability engineer, shares a comprehensive method that merges Terraform’s declarative infrastructure provisioning with Ansible’s imperative configuration management to achieve a robust, multi‑cloud orchestration workflow.

Pain Point Insight

Terraform excels at resource creation, state tracking, dependency resolution, and supports major cloud providers, but it struggles with complex configuration scripts and lacks fine‑grained configuration management. Ansible offers idempotent operations, a rich module ecosystem, and dynamic inventories, yet it does not manage infrastructure state.

Architecture Design: Layered Decoupling

┌─────────────────────────────────────────┐
│           GitOps Workflow                │
├─────────────────────────────────────────┤
│ Terraform Layer (Infrastructure)         │
│  ├── Network (VPC/Subnet/Security Group)│
│  ├── Compute (EC2/ECS/Lambda)            │
│  └── Storage (S3/RDS/ElastiCache)       │
├─────────────────────────────────────────┤
│ Ansible Layer (Configuration)           │
│  ├── System (users/permissions/services)│
│  ├── Application (containers/micro‑services)│
│  └── Operations (logging/alerts/backup) │
└─────────────────────────────────────────┘

Real‑World Multi‑Cloud Deployment Example

Step 1 – Terraform defines the base infrastructure

# main.tf – multi‑cloud definition
terraform {
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
    alicloud = { source = "aliyun/alicloud", version = "~> 1.200" }
  }
  backend "s3" {
    bucket = "terraform-state-prod"
    key    = "ecommerce/infrastructure.tfstate"
    region = "us-west-2"
  }
}

module "aws_infrastructure" {
  source = "./modules/aws"
  vpc_cidr = "10.0.0.0/16"
  availability_zones = ["us-west-2a", "us-west-2b", "us-west-2c"]
  enable_ansible_inventory = true
}

module "alicloud_infrastructure" {
  source = "./modules/alicloud"
  vpc_cidr = "172.16.0.0/16"
  zones = ["cn-hangzhou-g", "cn-hangzhou-h"]
  enable_ansible_inventory = true
}

resource "local_file" "ansible_inventory" {
  content  = templatefile("${path.module}/templates/inventory.tpl", {
    aws_instances = module.aws_infrastructure.instance_ips
    ali_instances = module.alicloud_infrastructure.instance_ips
    rds_endpoints = module.aws_infrastructure.rds_endpoints
  })
  filename = "../ansible/inventory/terraform.ini"
}

Step 2 – Ansible performs fine‑grained configuration

# site.yml – main playbook
---
- name: E‑commerce platform deployment
  hosts: localhost
  gather_facts: false
  vars:
    deployment_env: "{{ env | default('production') }}"
  tasks:
    - name: Prepare base environment
      include_tasks: tasks/infrastructure_check.yml
    - name: Deploy application services
      include_tasks: tasks/application_deploy.yml

Step 3 – CI/CD pipeline integration

# .github/workflows/deploy.yml
name: Multi‑Cloud Deployment Pipeline
on:
  push:
    branches: [main]
    paths: ['infrastructure/**', 'ansible/**']
jobs:
  terraform:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
        with:
          terraform_version: 1.5.0
      - name: Terraform Plan
        run: |
          cd infrastructure
          terraform init
          terraform plan -var-file="vars/${ENVIRONMENT}.tfvars"
      - name: Terraform Apply
        if: github.ref == 'refs/heads/main'
        run: |
          terraform apply -auto-approve -var-file="vars/${ENVIRONMENT}.tfvars"
  ansible:
    needs: terraform
    runs-on: ubuntu-latest
    steps:
      - name: Execute Ansible Playbook
        run: |
          cd ansible
          ansible-playbook -i inventory/terraform.ini site.yml \
            --extra-vars "env=${ENVIRONMENT}" \
            --vault-password-file .vault_pass

Advanced Techniques

1. State Sharing via Terraform Outputs

# outputs.tf
output "ansible_vars" {
  value = {
    database_endpoint = aws_rds_cluster.main.endpoint
    redis_cluster_config = aws_elasticache_replication_group.main.configuration_endpoint_address
    load_balancer_dns = aws_lb.main.dns_name
    security_groups = {
      web = aws_security_group.web.id
      db  = aws_security_group.db.id
    }
  }
  sensitive = false
}

resource "local_file" "ansible_vars" {
  content = yamlencode({
    infrastructure = {
      cloud_provider = "aws"
      region = var.aws_region
      environment = var.environment
    }
    services = local.service_endpoints
    network = {
      vpc_id = aws_vpc.main.id
      private_subnets = aws_subnet.private[*].id
      public_subnets = aws_subnet.public[*].id
    }
  })
  filename = "../ansible/group_vars/all/terraform.yml"
}

2. Dynamic Inventory Script (Python)

# inventory/terraform_inventory.py
import json, subprocess, sys

def get_terraform_output():
    try:
        result = subprocess.run(['terraform', 'output', '-json'], capture_output=True, text=True, cwd='../infrastructure')
        return json.loads(result.stdout)
    except Exception as e:
        print(f"Error getting terraform output: {e}", file=sys.stderr)
        return {}

def generate_inventory():
    tf_output = get_terraform_output()
    inventory = {'_meta': {'hostvars': {}}, 'all': {'children': ['aws', 'alicloud']}, 'aws': {'children': ['web_servers', 'db_servers'], 'vars': {'ansible_ssh_common_args': '-o StrictHostKeyChecking=no', 'cloud_provider': 'aws'}}, 'web_servers': {'hosts': []}, 'db_servers': {'hosts': []}}
    if 'instance_ips' in tf_output:
        for ip in tf_output['instance_ips']['value']:
            inventory['web_servers']['hosts'].append(ip)
            inventory['_meta']['hostvars'][ip] = {'ansible_host': ip, 'ansible_user': 'ec2-user', 'instance_type': 't3.medium'}
    return inventory

if __name__ == '__main__':
    print(json.dumps(generate_inventory(), indent=2))

Monitoring & Observability Integration

# roles/monitoring/tasks/main.yml
- name: Deploy monitoring stack
  block:
    - name: Configure Prometheus
      template:
        src: prometheus.yml.j2
        dest: /etc/prometheus/prometheus.yml
        vars:
          terraform_targets: "{{ terraform_monitoring_targets }}"
      notify: restart prometheus
    - name: Deploy Grafana dashboards
      grafana_dashboard:
        grafana_url: "{{ grafana_endpoint }}"
        grafana_api_key: "{{ grafana_api_key }}"
        dashboard: "{{ item }}"
      loop:
        - infrastructure-overview
        - application-metrics
        - multi-cloud-cost-analysis
    - name: Configure alert rules
      template:
        src: alert-rules.yml.j2
        dest: /etc/prometheus/rules/infrastructure.yml
        vars:
          notification_webhook: "{{ slack_webhook_url }}"

Cost‑Optimization Strategies

# modules/cost-optimization/main.tf
resource "aws_autoscaling_schedule" "scale_down" {
  scheduled_action_name = "scale-down-evening"
  min_size = 1
  max_size = 2
  desired_capacity = 1
  recurrence = "0 18 * * MON-FRI"
  autoscaling_group_name = aws_autoscaling_group.web.name
}

resource "aws_autoscaling_schedule" "scale_up" {
  scheduled_action_name = "scale-up-morning"
  min_size = 2
  max_size = 10
  desired_capacity = 3
  recurrence = "0 8 * * MON-FRI"
  autoscaling_group_name = aws_autoscaling_group.web.name
}

resource "aws_autoscaling_group" "web" {
  mixed_instances_policy {
    instances_distribution {
      on_demand_percentage = 20
      spot_allocation_strategy = "diversified"
    }
    launch_template {
      launch_template_specification {
        launch_template_id = aws_launch_template.web.id
        version = "$Latest"
      }
      override {
        instance_type = "t3.medium"
        weighted_capacity = "1"
      }
      override {
        instance_type = "t3.large"
        weighted_capacity = "2"
      }
    }
  }
}

Security Best Practices

1. Key Management

# playbooks/security-hardening.yml
- name: Security hardening configuration
  hosts: all
  become: yes
  vars:
    vault_secrets: "{{ vault_aws_secrets }}"
  tasks:
    - name: Retrieve DB password from SSM
      aws_ssm_parameter_store:
        name: "/{{ environment }}/database/password"
        region: "{{ aws_region }}"
      register: db_password
      no_log: true
    - name: Write secrets to Vault
      hashivault_write:
        mount_point: secret
        secret: "{{ app_name }}/{{ environment }}"
        data:
          database_url: "{{ vault_secrets.database_url }}"
          api_keys: "{{ vault_secrets.api_keys }}"

2. Network Security (Zero‑Trust)

# aws_security_group for web tier
resource "aws_security_group" "web_tier" {
  name_prefix = "web-tier-"
  vpc_id = aws_vpc.main.id
  ingress {
    from_port = 80
    to_port   = 80
    protocol  = "tcp"
    security_groups = [aws_security_group.alb.id]
  }
  egress {
    from_port = 443
    to_port   = 443
    protocol  = "tcp"
    cidr_blocks = ["0.0.0.0/0"]
  }
  tags = {
    Environment = var.environment
    ManagedBy   = "terraform"
  }
}

Fault‑Handling Real‑World Case

During a production rollout, cross‑cloud data‑sync latency was observed. Using the Terraform‑generated inventory, the team ran diagnostic playbooks to collect system metrics, ping remote endpoints, and query PostgreSQL replication lag, then generated an HTML report for rapid root‑cause analysis.

# playbooks/troubleshooting.yml
- name: Production fault diagnosis
  hosts: all
  gather_facts: yes
  tasks:
    - name: Collect system facts
      setup:
        filter: "ansible_*"
    - name: Network connectivity check
      command: "ping -c 4 {{ item }}"
      loop: "{{ cross_region_endpoints }}"
      register: ping_results
    - name: Database latency test
      postgresql_query:
        db: "{{ db_name }}"
        query: "SELECT pg_stat_replication.*, now() - sent_lsn::text::timestamp as lag"
      register: replication_lag
    - name: Generate diagnostic report
      template:
        src: diagnostic_report.j2
        dest: "/tmp/diagnostic-{{ ansible_date_time.epoch }}.html"
        delegate_to: localhost

Performance Tuning Secrets

Terraform Optimizations

# terraform.tf – enable parallelism and data source caching
terraform {
  experiments = [module_variable_optional_attrs]
  required_providers {
    aws = { source = "hashicorp/aws", version = "~> 5.0" }
  }
}

data "aws_ami" "amazon_linux" {
  most_recent = true
  owners = ["amazon"]
  filter {
    name   = "name"
    values = ["amzn2-ami-hvm-*-x86_64-gp2"]
  }
}

resource "aws_instance" "web" {
  count = var.instance_count
  ami   = data.aws_ami.amazon_linux.id
  instance_type = var.instance_type
  for_each = var.instance_configs
  tags = merge(var.default_tags, { Name = "web-${each.key}" })
}

Ansible Performance Settings

# ansible.cfg – increase forks and enable pipelining
[defaults]
forks = 50
host_key_checking = False
retry_files_enabled = False
fact_caching = redis
fact_caching_timeout = 3600
fact_caching_connection = localhost:6379:0

[ssh_connection]
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o ControlPath=/tmp/ansible-ssh-%h-%p-%r
pipelining = True
control_path_dir = /tmp

Enterprise‑Level Best‑Practice Summary

Tool Selection : Use Terraform for immutable infrastructure lifecycle and Ansible for mutable configuration and application deployment.

Code Organization : Separate infrastructure/ (Terraform) and ansible/ directories, with environment‑specific modules and inventories.

Versioning : Adopt semantic versioning for infrastructure modules, keep separate state files per environment, and snapshot before each change to enable one‑click rollback.

Monitoring & Alerting : Deploy Prometheus and Grafana via Ansible, monitor both resource metrics and application performance, and set cost‑anomaly alerts.

Security : Store secrets in Vault, enforce zero‑trust network groups, and rotate keys via automated playbooks.

CI/CD Integration : Trigger Terraform plan/apply and Ansible playbooks from GitHub Actions, passing environment variables and using encrypted vault passwords.

By treating infrastructure as code and coupling Terraform with Ansible, teams can shift from reactive firefighting to proactive architecture design, achieving repeatable, auditable, and scalable multi‑cloud deployments.

CI/CDoperationsmulti-cloudInfrastructure AutomationiacTerraformAnsible
Raymond Ops
Written by

Raymond Ops

Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.