Master Multi‑Cloud Orchestration with Terraform + Ansible: IaC Best Practices
This article explains how to combine Terraform and Ansible to build a robust, enterprise‑grade multi‑cloud resource orchestration workflow, covering their strengths, limitations, layered architecture, real‑world e‑commerce deployment, CI/CD integration, advanced tips, cost optimization, and security best practices.
Terraform+Ansible Dual Sword: Best Practices for Multi‑Cloud Resource Orchestration in the IaC Era
In the wave of cloud native, manual operations can no longer meet digital transformation needs. As a veteran ops engineer, I share how to combine Terraform and Ansible for enterprise‑grade multi‑cloud orchestration.
Pain Points: Why Single Tools Aren’t Enough?
Terraform Strengths and Limitations
Terraform excels at declarative IaC resource provisioning:
State Management : tfstate file tracks resource changes precisely.
Dependency Resolution : automatically builds a dependency graph to ensure correct creation order.
Multi‑Cloud Support : provider ecosystem covers major cloud vendors.
However, in real projects Terraform shows clear shortcomings:
# Terraform excels at creating infrastructure
resource "aws_instance" "web" {
ami = "ami-0c55b159cbfafe1d0"
instance_type = "t3.medium"
# Complex configuration management is weak
user_data = <<-EOF
#!/bin/bash
yum update -y
# Lots of scripts piled up, hard to maintain
EOF
}Ansible Configuration Management Advantages
Ansible stands out in configuration management and application deployment:
Idempotent Operations : repeated runs produce no side effects.
Rich Module Library : covers system, network, cloud services, etc.
Dynamic Inventory : flexibly adapts to dynamic infrastructure.
Nevertheless, Ansible lacks robust state management for infrastructure provisioning.
Architecture Design: Building a Collaborative System
Based on years of practice, I designed a “layered decoupling” architecture:
┌─────────────────────────────────────────┐
│ GitOps Workflow │
├─────────────────────────────────────────┤
│ Terraform Layer (Infrastructure Supply) │
│ ├── Network (VPC/Subnet/Security Group) │
│ ├── Compute (EC2/ECS/Lambda) │
│ └── Storage (S3/RDS/ElastiCache) │
├─────────────────────────────────────────┤
│ Ansible Layer (Configuration Management)│
│ ├── System Config (users/permissions) │
│ ├── App Deployment (containers) │
│ └── Monitoring (logs/alerts/backup) │
└─────────────────────────────────────────┘Hands‑On Demo: Multi‑Cloud E‑Commerce Deployment
Example: Deploy a cross‑AWS and Alibaba Cloud e‑commerce platform.
Step 1 – Define Infrastructure with Terraform
# main.tf – Multi‑Cloud infrastructure definition
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
alicloud = {
source = "aliyun/alicloud"
version = "~> 1.200"
}
}
backend "s3" {
bucket = "terraform-state-prod"
key = "ecommerce/infrastructure.tfstate"
region = "us-west-2"
}
}
module "aws_infrastructure" {
source = "./modules/aws"
vpc_cidr = "10.0.0.0/16"
availability_zones = ["us-west-2a","us-west-2b","us-west-2c"]
enable_ansible_inventory = true
}
module "alicloud_infrastructure" {
source = "./modules/alicloud"
vpc_cidr = "172.16.0.0/16"
zones = ["cn-hangzhou-g","cn-hangzhou-h"]
enable_ansible_inventory = true
}
resource "local_file" "ansible_inventory" {
content = templatefile("${path.module}/templates/inventory.tpl", {
aws_instances = module.aws_infrastructure.instance_ips
ali_instances = module.alicloud_infrastructure.instance_ips
rds_endpoints = module.aws_infrastructure.rds_endpoints
})
filename = "../ansible/inventory/terraform.ini"
}Step 2 – Fine‑Grained Configuration with Ansible
# playbooks/site.yml – Main orchestration
---
- name: E‑Commerce Platform Deployment
hosts: localhost
gather_facts: false
vars:
deployment_env: "{{ env | default('production') }}"
tasks:
- name: Prepare base environment
include_tasks: tasks/infrastructure_check.yml
- name: Deploy application services
include_tasks: tasks/application_deploy.yml
# tasks/infrastructure_check.yml
---
- name: Verify Terraform output
block:
- name: Check instance reachability
wait_for:
host: "{{ item }}"
port: 22
timeout: 300
loop: "{{ groups['web_servers'] }}"
# tasks/application_deploy.yml
---
- name: Deploy containerized application
block:
- name: Configure Docker
include_role:
name: docker
vars:
docker_compose_version: "2.20.0"
- name: Deploy microservice stack
docker_compose:
project_src: "{{ app_path }}/docker-compose"
definition:
version: '3.8'
services:
frontend:
image: "{{ ecr_registry }}/ecommerce-frontend:{{ app_version }}"
ports: ["80:3000"]
environment:
API_ENDPOINT: "{{ api_gateway_url }}"
backend:
image: "{{ ecr_registry }}/ecommerce-backend:{{ app_version }}"
environment:
DATABASE_URL: "{{ database_connection_string }}"
REDIS_URL: "{{ redis_cluster_endpoint }}"Step 3 – CI/CD Pipeline Integration
# .github/workflows/deploy.yml
name: Multi‑Cloud Deployment Pipeline
on:
push:
branches: [ main ]
paths: ['infrastructure/**', 'ansible/**']
jobs:
terraform:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Terraform
uses: hashicorp/setup-terraform@v2
with:
terraform_version: 1.5.0
- name: Terraform Plan
run: |
cd infrastructure
terraform init
terraform plan -var-file="vars/${ENVIRONMENT}.tfvars"
- name: Terraform Apply
if: github.ref == 'refs/heads/main'
run: |
terraform apply -auto-approve -var-file="vars/${ENVIRONMENT}.tfvars"
ansible:
needs: terraform
runs-on: ubuntu-latest
steps:
- name: Execute Ansible Playbook
run: |
cd ansible
ansible-playbook -i inventory/terraform.ini site.yml \
--extra-vars "env=${ENVIRONMENT}" \
--vault-password-file .vault_passAdvanced Tips: Smoother Collaboration
1. State Sharing Mechanism
Pass Terraform output variables to Ansible:
# outputs.tf
output "ansible_vars" {
value = {
database_endpoint = aws_rds_cluster.main.endpoint
redis_cluster_config = aws_elasticache_replication_group.main.configuration_endpoint_address
load_balancer_dns = aws_lb.main.dns_name
security_groups = {
web = aws_security_group.web.id
db = aws_security_group.db.id
}
}
sensitive = false
}2. Dynamic Inventory Management
# inventory/terraform_inventory.py
import json, subprocess, sys
def get_terraform_output():
try:
result = subprocess.run(['terraform','output','-json'],
capture_output=True, text=True,
cwd='../infrastructure')
return json.loads(result.stdout)
except Exception as e:
print(f"Error getting terraform output: {e}", file=sys.stderr)
return {}
def generate_inventory():
tf_output = get_terraform_output()
inventory = {
'_meta': {'hostvars': {}},
'all': {'children': ['aws','alicloud']},
'aws': {'children': ['web_servers','db_servers'],
'vars': {'ansible_ssh_common_args':'-o StrictHostKeyChecking=no',
'cloud_provider':'aws'}},
'web_servers': {'hosts': []},
'db_servers': {'hosts': []}
}
if 'instance_ips' in tf_output:
for ip in tf_output['instance_ips']['value']:
inventory['web_servers']['hosts'].append(ip)
inventory['_meta']['hostvars'][ip] = {
'ansible_host': ip,
'ansible_user': 'ec2-user',
'instance_type': 't3.medium'
}
return inventory
if __name__ == '__main__':
print(json.dumps(generate_inventory(), indent=2))3. Error Handling and Rollback Strategy
# playbooks/rollback.yml – Smart rollback
---
- name: Application Deployment Rollback
hosts: web_servers
serial: "{{ rollback_batch_size | default(1) }}"
max_fail_percentage: 10
vars:
health_check_retries: 5
health_check_delay: 30
pre_tasks:
- name: Create rollback snapshot
block:
- name: Backup current config
archive:
path: "{{ app_path }}"
dest: "/backup/app-{{ ansible_date_time.epoch }}.tar.gz"
- name: Record current version
copy:
content: "{{ current_version }}"
dest: "/backup/current_version"
tasks:
- name: Execute version rollback
block:
- name: Stop current service
systemd:
name: "{{ app_service_name }}"
state: stopped
- name: Deploy historic version
unarchive:
src: "{{ rollback_package_url }}"
dest: "{{ app_path }}"
remote_src: yes
- name: Start service
systemd:
name: "{{ app_service_name }}"
state: started
enabled: yes
rescue:
- name: Rollback failure handling
fail:
msg: "Rollback failed, manual intervention required"
post_tasks:
- name: Health check
uri:
url: "http://{{ ansible_host }}:{{ app_port }}/health"
method: GET
status_code: 200
retries: "{{ health_check_retries }}"
delay: "{{ health_check_delay }}"Monitoring and Observability Integration
# roles/monitoring/tasks/main.yml
---
- name: Deploy monitoring stack
block:
- name: Prometheus configuration
template:
src: prometheus.yml.j2
dest: /etc/prometheus/prometheus.yml
vars:
terraform_targets: "{{ terraform_monitoring_targets }}"
notify: restart prometheus
- name: Grafana dashboards
grafana_dashboard:
grafana_url: "{{ grafana_endpoint }}"
grafana_api_key: "{{ grafana_api_key }}"
dashboard: "{{ item }}"
loop:
- infrastructure-overview
- application-metrics
- multi-cloud-cost-analysis
- name: Alert rule configuration
template:
src: alert-rules.yml.j2
dest: /etc/prometheus/rules/infrastructure.yml
vars:
notification_webhook: "{{ slack_webhook_url }}"Cost Optimization Strategies
Automate cost control with scheduled scaling and spot‑instance policies:
# modules/cost-optimization/main.tf
resource "aws_autoscaling_schedule" "scale_down" {
scheduled_action_name = "scale-down-evening"
min_size = 1
max_size = 2
desired_capacity = 1
recurrence = "0 18 * * MON-FRI"
autoscaling_group_name = aws_autoscaling_group.web.name
}
resource "aws_autoscaling_schedule" "scale_up" {
scheduled_action_name = "scale-up-morning"
min_size = 2
max_size = 10
desired_capacity = 3
recurrence = "0 8 * * MON-FRI"
autoscaling_group_name = aws_autoscaling_group.web.name
}
resource "aws_autoscaling_group" "web" {
mixed_instances_policy {
instances_distribution {
on_demand_percentage = 20
spot_allocation_strategy = "diversified"
}
launch_template {
launch_template_specification {
launch_template_id = aws_launch_template.web.id
version = "$Latest"
}
override {
instance_type = "t3.medium"
weighted_capacity = "1"
}
override {
instance_type = "t3.large"
weighted_capacity = "2"
}
}
}
}Security Best Practices
1. Key Management
# playbooks/security-hardening.yml
---
- name: Security hardening configuration
hosts: all
become: yes
vars:
vault_secrets: "{{ vault_aws_secrets }}"
tasks:
- name: Retrieve DB password from SSM
aws_ssm_parameter_store:
name: "/{{ environment }}/database/password"
region: "{{ aws_region }}"
register: db_password
no_log: true
- name: Write secrets to Vault
hashivault_write:
mount_point: secret
secret: "{{ app_name }}/{{ environment }}"
data:
database_url: "{{ vault_secrets.database_url }}"
api_keys: "{{ vault_secrets.api_keys }}"2. Network Security
# Zero‑Trust security group
resource "aws_security_group" "web_tier" {
name_prefix = "web-tier-"
vpc_id = aws_vpc.main.id
ingress {
from_port = 80
to_port = 80
protocol = "tcp"
security_groups = [aws_security_group.alb.id]
}
egress {
from_port = 443
to_port = 443
protocol = "tcp"
cidr_blocks = ["0.0.0.0/0"]
}
tags = {
Environment = var.environment
ManagedBy = "terraform"
}
}Enterprise‑Level Best‑Practice Summary
Key takeaways:
Terraform for infrastructure : lifecycle of network, compute, storage.
Ansible for configuration : system setup, app deployment, ops automation.
Clear division of responsibilities : avoid overlap, keep architecture clean.
Organize code with separate infrastructure/ and ansible/ directories, use environment‑specific modules, and adopt semantic versioning with dedicated state files for each environment.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
