How to Build Reusable Multi‑Cloud Infrastructure with Terraform
Learn how to replace manual, error‑prone cloud console clicks with Terraform‑driven, reusable multi‑cloud infrastructure, covering why multi‑cloud matters, Terraform fundamentals, project layout, example networking and compute modules for AWS and Alibaba Cloud, CI/CD integration, security scanning, cost optimization, and best‑practice guidelines.
Introduction: From manual clicks to code‑defined infrastructure
Manual, button‑driven provisioning in cloud consoles leads to errors, delays, and high operational cost, especially when the same environment must be replicated across multiple providers. Terraform enables a shift from labor‑intensive tasks to declarative, version‑controlled infrastructure code.
Why Multi‑Cloud is inevitable
Business‑driven multi‑cloud needs
Cost optimization : Different providers price services differently; compute‑heavy workloads may be cheaper on AWS, storage‑heavy on Alibaba Cloud.
Risk diversification : Single‑provider outages (e.g., AWS us‑east‑1 2021 incident) can cause widespread disruption; multi‑cloud mitigates this risk.
Compliance requirements : Data sovereignty laws may mandate regional storage, which multi‑cloud strategies can satisfy.
Technical complementarity : Each provider offers unique services—AWS ML, Azure enterprise integration, Alibaba domestic network advantages.
Challenges of traditional operations
Configuration inconsistency : Manual setups cannot guarantee identical configurations across clouds.
Rising operational cost : Managing multiple consoles and APIs increases effort.
Security risk : Human error and configuration drift are hard to detect.
Poor scalability : Growth turns infrastructure expansion into a nightmare.
Terraform: Best practice for Infrastructure as Code
What is Terraform?
Terraform is an open‑source IaC tool from HashiCorp that uses the declarative HCL language to define and manage infrastructure across providers.
Core advantages
Declarative configuration – describe the desired end state, Terraform computes the actions.
State management – a state file tracks real resources, ensuring consistency.
Plan preview – terraform plan shows changes before execution.
Rich provider ecosystem – supports 1000+ providers covering all major clouds.
Hands‑on: Building a reusable multi‑cloud stack
Project layout
terraform-multicloud/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ ├── terraform.tfvars
│ │ └── versions.tf
│ ├── staging/
│ └── prod/
├── modules/
│ ├── networking/
│ │ ├── main.tf
│ │ ├── variables.tf
│ │ ├── outputs.tf
│ │ └── versions.tf
│ ├── compute/
│ └── database/
├── shared/
│ ├── backend.tf
│ └── provider.tf
└── scripts/
├── deploy.sh
└── destroy.shReusable networking module (excerpt)
# modules/networking/variables.tf
variable "environment" { description = "Environment name" type = string }
variable "cloud_provider" {
description = "Cloud provider"
type = string
validation {
condition = contains(["aws", "alicloud", "azure"], var.cloud_provider)
error_message = "Supported providers: aws, alicloud, azure"
}
}
variable "vpc_cidr" { description = "VPC CIDR block" type = string default = "10.0.0.0/16" }
variable "availability_zones" { description = "AZ list" type = list(string) }
# modules/networking/main.tf
locals {
common_tags = {
Environment = var.environment
ManagedBy = "terraform"
Project = "multicloud-demo"
}
}
resource "aws_vpc" "main" {
count = var.cloud_provider == "aws" ? 1 : 0
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = merge(local.common_tags, { Name = "${var.environment}-vpc" })
}
# Similar resources for alicloud_vpc and subnets omitted for brevityCompute module example
# modules/compute/variables.tf
variable "environment" { type = string }
variable "cloud_provider" { type = string }
variable "instance_count" { type = number default = 2 }
variable "subnet_ids" { type = list(string) }
variable "vpc_id" { type = string }
# modules/compute/main.tf
locals {
instance_types = {
aws = { small = "t3.micro", medium = "t3.small", large = "t3.medium" }
alicloud = { small = "ecs.t6-c1m1.large", medium = "ecs.t6-c2m1.large", large = "ecs.t6-c4m1.large" }
}
instance_size = var.environment == "prod" ? "large" : "small"
instance_type = local.instance_types[var.cloud_provider][local.instance_size]
}
resource "aws_instance" "app" {
count = var.cloud_provider == "aws" ? var.instance_count : 0
ami = data.aws_ami.ubuntu[0].id
instance_type = local.instance_type
subnet_id = var.subnet_ids[count.index % length(var.subnet_ids)]
tags = {
Name = "${var.environment}-app-${count.index + 1}"
Environment = var.environment
ManagedBy = "terraform"
}
}
# Corresponding alicloud_instance and data sources omitted for brevityEnvironment configuration
# environments/prod/main.tf
terraform {
required_version = ">= 1.0"
backend "s3" {
bucket = "my-terraform-state-bucket"
key = "prod/terraform.tfstate"
region = "us-west-2"
}
required_providers {
aws = { source = "hashicorp/aws", version = "~> 5.0" }
alicloud = { source = "aliyun/alicloud", version = "~> 1.200" }
}
}
provider "aws" { region = var.aws_region }
provider "alicloud" { region = var.alicloud_region }
module "aws_networking" {
source = "../../modules/networking"
environment = "prod"
cloud_provider = "aws"
vpc_cidr = "10.1.0.0/16"
availability_zones = ["us-west-2a","us-west-2b","us-west-2c"]
}
module "aws_compute" { /* omitted for brevity */ }
module "alicloud_networking" { /* omitted for brevity */ }
module "alicloud_compute" { /* omitted for brevity */ }Advanced features
Conditional deployment
# Enable CloudWatch dashboard only in prod
resource "aws_cloudwatch_dashboard" "main" {
count = var.environment == "prod" ? 1 : 0
dashboard_name = "${var.environment}-dashboard"
dashboard_body = jsonencode({
widgets = [{
type = "metric",
properties = {
metrics = [["AWS/EC2","CPUUtilization","InstanceId",aws_instance.app[0].id]],
period = 300,
stat = "Average",
region = var.aws_region,
title = "EC2 Instance CPU"
}
}]
})
}Dynamic configuration per provider
locals {
cloud_configs = {
aws = {
machine_types = ["t3.micro","t3.small","t3.medium"]
storage_types = ["gp3","io1","sc1"]
network_acl_rules = [{ rule_number = 100, protocol = "tcp", rule_action = "allow", port_range = "80", cidr_block = "0.0.0.0/0" }]
}
alicloud = {
machine_types = ["ecs.t6-c1m1.large","ecs.t6-c2m1.large"]
storage_types = ["cloud_efficiency","cloud_ssd","cloud_essd"]
security_rules = [{ type = "ingress", ip_protocol = "tcp", port_range = "80/80", cidr_ip = "0.0.0.0/0" }]
}
}
current_config = local.cloud_configs[var.cloud_provider]
}Remote state and team collaboration
# shared/backend.tf
terraform {
backend "s3" {
bucket = "my-company-terraform-state"
key = "environments/${var.environment}/terraform.tfstate"
region = "us-west-2"
dynamodb_table = "terraform-state-lock"
encrypt = true
}
}
resource "aws_dynamodb_table" "terraform_state_lock" {
name = "terraform-state-lock"
billing_mode = "PAY_PER_REQUEST"
hash_key = "LockID"
attribute {
name = "LockID"
type = "S"
}
tags = { Name = "TerraformStateLock", ManagedBy = "terraform" }
}CI/CD integration (GitLab)
# .gitlab-ci.yml
stages:
- validate
- plan
- apply
variables:
TF_ROOT: "${CI_PROJECT_DIR}"
TF_IN_AUTOMATION: "true"
.validate:
image: hashicorp/terraform:1.5
script:
- cd "${TF_ROOT}/environments/${ENVIRONMENT}"
- terraform init
- terraform validate
- terraform fmt -check=true -diff=true
.plan:
extends: .validate
script:
- terraform plan -out=tfplan
artifacts:
paths:
- "${TF_ROOT}/environments/${ENVIRONMENT}/tfplan"
only:
- merge_requests
- master
.apply:
extends: .validate
script:
- terraform apply tfplan
dependencies:
- plan
only:
- master
when: manualSecurity scanning integration
# scripts/security-scan.sh
#!/bin/bash
echo "🔍 Starting Terraform security scan..."
tfsec . --format json --out=tfsec-results.json
checkov -d . --output json --output-file checkov-results.json
terraform-compliance -f compliance-tests/ -p .
echo "✅ Security scan completed"Tag standardization
# modules/common/tags.tf
locals {
common_tags = {
Environment = var.environment
Project = var.project_name
ManagedBy = "terraform"
Team = var.team
CostCenter = var.cost_center
CreatedAt = timestamp()
Terraform = "true"
}
environment_tags = {
dev = { AutoShutdown = "true", ShutdownTime = "18:00" }
prod = { Backup = "true", Monitoring = "enabled", AlertLevel = "critical" }
}
final_tags = merge(local.common_tags, lookup(local.environment_tags, var.environment, {}))
}Cost optimization
Automated shutdown for development
# Development environment auto‑shutdown
resource "aws_lambda_function" "auto_shutdown" {
count = var.environment == "dev" ? 1 : 0
filename = "auto-shutdown.zip"
function_name = "dev-auto-shutdown"
role = aws_iam_role.lambda_role[0].arn
handler = "index.handler"
runtime = "python3.9"
environment {
variables = { ENVIRONMENT = var.environment }
}
}
resource "aws_cloudwatch_event_rule" "shutdown_schedule" {
count = var.environment == "dev" ? 1 : 0
name = "dev-shutdown-schedule"
description = "Auto‑close dev resources"
schedule_expression = "cron(0 18 * * MON-FRI *)"
}Dynamic resource scaling
locals {
business_hours_scaling = {
weekday_work_hours = { min_capacity = var.environment == "prod" ? 2 : 1, max_capacity = var.environment == "prod" ? 10 : 3, target_cpu = 70 }
off_hours = { min_capacity = var.environment == "prod" ? 1 : 0, max_capacity = var.environment == "prod" ? 3 : 1, target_cpu = 80 }
}
}Best practices and pitfalls
Module design principles
Single responsibility : each module should manage only one type of resource.
Testability : modules must be easy to unit‑test.
Documentation : every variable and output needs a clear description.
Common traps
State file handling : never edit the state file manually; use terraform import to bring existing resources under management.
Sensitive data : store secrets in Vault or cloud KMS instead of plain variables.
Implicit dependencies : declare explicit depends_on to control resource order.
Conclusion
Terraform transforms repetitive, error‑prone manual steps into reliable, version‑controlled code, delivering consistency, reusability, maintainability, and cost efficiency across AWS, Alibaba Cloud, and Azure. Adopt a modular layout, remote state, CI/CD pipelines, and security checks to fully realize the benefits of infrastructure as code.
Next actions
Start with a small project to gain Terraform experience.
Build a shared module library and document best practices.
Integrate the pipeline into GitLab or other CI systems for full automation.
Set up monitoring and alerting to keep the infrastructure healthy.
Repository references: https://github.com/raymond999999, https://gitee.com/raymond9.
Raymond Ops
Linux ops automation, cloud-native, Kubernetes, SRE, DevOps, Python, Golang and related tech discussions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
