From Shell Scripts to Terraform: Mastering Infrastructure as Code
This article traces the evolution of infrastructure automation from ad‑hoc Shell scripts through configuration‑management tools to modern declarative Terraform, highlighting common pitfalls, offering concrete best‑practice recommendations, detailed code samples, and a real‑world migration case study that demonstrates how to transition safely and efficiently.
Introduction
In the traditional operations era, engineers manually logged into servers to run commands, edit configuration files and restart services, which became inefficient and error‑prone as the business grew. Infrastructure as Code (IaC) emerged to version‑control, automate and continuously iterate infrastructure.
Evolution of IaC
Shell script era
Shell scripts were the first tool for automation, allowing quick implementation of tasks such as batch deployment, scheduled backups and system monitoring.
Idempotence hard to guarantee
Missing state management
Poor cross‑platform compatibility
Complex error handling
Low maintainability
Configuration‑management tools
Tools like Ansible, Puppet and Chef introduced higher‑level abstractions, idempotent operations and basic state tracking, but they focus on configuration rather than full lifecycle management of cloud resources.
Terraform era
Terraform provides a declarative language (HCL), provider ecosystem for all major clouds, robust state management, dependency handling and plan preview, enabling “code is infrastructure”.
Core comparison: Shell → Terraform
Shell script example
#!/bin/bash
# Traditional LAMP deployment script
set -e
APP_USER="webapp"
APP_DIR="/var/www/myapp"
NGINX_CONF="/etc/nginx/sites-available/myapp"
DB_NAME="myapp_db"
DB_USER="myapp_user"
DB_PASS="$(openssl rand -base64 32)"
echo "=== Starting deployment ==="
# ... (rest of script omitted for brevity)Advantages: intuitive, quick, flexible. Drawbacks: idempotence issues, no state tracking, difficult rollback, chaotic change management.
Improved idempotent Shell script
#!/bin/bash
# Improved idempotent script
set -euo pipefail
ensure_package_installed() { ... }
ensure_user_exists() { ... }
# Main flow
echo "=== Idempotent deployment start ==="
# ... (rest omitted)Terraform best practice
Project structure
terraform-project/
├── main.tf
├── variables.tf
├── outputs.tf
├── terraform.tfvars
├── versions.tf
├── backend.tf
├── modules/
│ ├── networking/
│ └── compute/
└── environments/
├── dev/
├── staging/
└── prod/versions.tf (provider version)
terraform {
required_version = ">= 1.0"
required_providers {
aws = {
source = "hashicorp/aws"
version = "~> 5.0"
}
}
backend "s3" {
bucket = "mycompany-terraform-states"
key = "webapp/terraform.tfstate"
region = "us-west-2"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
provider "aws" {
region = var.aws_region
default_tags {
tags = {
Environment = var.environment
ManagedBy = "Terraform"
Project = var.project_name
}
}
}variables.tf
variable "aws_region" { description = "AWS region" type = string default = "us-west-2" }
variable "environment" { description = "Environment name" type = string validation { condition = contains(["dev","staging","prod"], var.environment) error_message = "Environment must be dev, staging or prod" } }
variable "project_name" { description = "Project name" type = string default = "myapp" }
variable "vpc_cidr" { description = "VPC CIDR block" type = string default = "10.0.0.0/16" }
variable "availability_zones" { description = "AZ list" type = list(string) default = ["us-west-2a","us-west-2b"] }
variable "instance_type" { description = "EC2 instance type" type = string default = "t3.medium" }
variable "app_instance_count" { description = "Number of app servers" type = number default = 2 }
variable "db_instance_class" { description = "RDS instance class" type = string default = "db.t3.medium" }
variable "db_password" { description = "Database password" type = string sensitive = true }main.tf (core resources)
# VPC
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = { Name = "${var.project_name}-${var.environment}-vpc" }
}
# Public subnet
resource "aws_subnet" "public" {
count = length(var.availability_zones)
vpc_id = aws_vpc.main.id
cidr_block = cidrsubnet(var.vpc_cidr, 8, count.index)
availability_zone = var.availability_zones[count.index]
map_public_ip_on_launch = true
tags = { Name = "${var.project_name}-${var.environment}-public-${count.index + 1}", Type = "public" }
}
# Private subnet, IGW, NAT, route tables, security groups, RDS, ALB, etc.
# (full definitions omitted for brevity)user-data.sh (EC2 init script)
#!/bin/bash
set -e
yum update -y
yum install -y docker amazon-cloudwatch-agent
systemctl start docker
systemctl enable docker
usermod -aG docker ec2-user
cat > /etc/environment <<EOF
DB_ENDPOINT=${db_endpoint}
DB_NAME=${db_name}
DB_USERNAME=${db_username}
ENVIRONMENT=${environment}
EOF
docker run -d --name myapp --restart always -p 8080:8080 \
-e DB_ENDPOINT=${db_endpoint} -e DB_NAME=${db_name} \
-e DB_USERNAME=${db_username} -e ENVIRONMENT=${environment} \
mycompany/myapp:latest
# CloudWatch config omitted
echo "Instance init complete"outputs.tf
output "vpc_id" { description = "VPC ID" value = aws_vpc.main.id }
output "alb_dns_name" { description = "Load balancer DNS" value = aws_lb.main.dns_name }
output "rds_endpoint" { description = "RDS endpoint" value = aws_db_instance.main.endpoint sensitive = true }Common Terraform commands
terraform init
terraform init -upgrade
terraform validate
terraform fmt -recursive
terraform plan
terraform plan -out=tfplan
terraform apply
terraform apply -auto-approve
terraform destroy
terraform output
terraform graph | dot -Tpng > graph.png
terraform force-unlock <LOCK_ID>Migration case study
Background
A company had 10 web servers, 3 database servers, 2 Redis caches and an Nginx load balancer managed by ad‑hoc Shell scripts.
Pre‑migration Shell script
#!/bin/bash
SERVERS=("web01.example.com" "web02.example.com" "web03.example.com")
SSH_KEY="/root/.ssh/deploy_key"
APP_VERSION="v2.3.1"
# Deploy function, status collection, command runner, etc.
# (full script omitted)Post‑migration Terraform + Ansible hybrid
Terraform defines the infrastructure (VPC, subnets, security groups, RDS, ALB, Auto Scaling). Ansible or user‑data scripts handle in‑instance configuration.
deployment.sh (Terraform helper script)
#!/bin/bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
TERRAFORM_DIR="${SCRIPT_DIR}/terraform"
ENVIRONMENT="${1:-dev}"
# Functions: check_prerequisites, init_terraform, validate_config, plan_changes, apply_changes, show_status, rolling_update, scale_instances, main
# (implementation omitted for brevity)
main "$@"Comparison table
Dimension
Shell script
Terraform
Deployment time
30‑45 min (serial)
10‑15 min (parallel)
Error rate
≈15 % (manual)
<1 % (automated)
Rollback
Difficult, manual
Simple, terraform apply previous version
State tracking
None
Full state file
Change visibility
Low
High via terraform plan
Collaboration
Low
High with code review
Best practices and pitfalls
Shell script best practices
#!/bin/bash
set -euo pipefail
readonly SCRIPT_NAME=$(basename "$0")
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly SCRIPT_VERSION="1.0.0"
log() { echo "[${$(date +'%Y-%m-%d %H:%M:%S')}]") "$@" | tee -a "${LOG_FILE:-/var/log/deploy.log}"; }
error() { echo "[ERROR] $*" >&2; exit 1; }
trap cleanup EXIT INT TERM
# Argument validation, idempotent package checks, safe_execute, main flow, etc.Terraform best practices
Project structure
# Recommended enterprise Terraform layout
tterraform-infrastructure/
├── global/
│ ├── s3/
│ └── iam/
├── modules/
│ ├── vpc/
│ └── compute/
├── environments/
│ ├── dev/
│ ├── staging/
│ └── prod/
└── scripts/Remote state management
terraform {
backend "s3" {
bucket = "mycompany-terraform-states"
key = "environments/prod/terraform.tfstate"
region = "us-west-2"
encrypt = true
dynamodb_table = "terraform-state-lock"
}
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
rule {
apply_server_side_encryption_by_default { sse_algorithm = "AES256" }
}
}
resource "aws_s3_bucket_versioning" "terraform_state" {
bucket = aws_s3_bucket.terraform_state.id
versioning_configuration { status = "Enabled" }
}Modular design example (VPC module)
variable "vpc_cidr" { type = string }
variable "environment" { type = string }
variable "availability_zones" { type = list(string) }
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
enable_dns_hostnames = true
enable_dns_support = true
tags = { Name = "${var.environment}-vpc" }
}
output "vpc_id" { value = aws_vpc.main.id }Sensitive data handling
# Store password in AWS Secrets Manager
aws secretsmanager create-secret --name prod/db/password --secret-string "$(openssl rand -base64 32)"
# Reference in Terraform
data "aws_secretsmanager_secret_version" "db_password" { secret_id = "prod/db/password" }
resource "aws_db_instance" "main" {
# ...
password = data.aws_secretsmanager_secret_version.db_password.secret_string
}Common errors and solutions
State lock conflict
# Error: Error acquiring the state lock
ps aux | grep terraform
terraform force-unlock <LOCK_ID>
# Use CI/CD to serialize runsResource dependency issue
# Incorrect order
resource "aws_instance" "app" { vpc_security_group_ids = [aws_security_group.app.id] }
resource "aws_security_group" "app" { }
# Fix with explicit depends_on
resource "aws_instance" "app" { depends_on = [aws_security_group.app] }Shell idempotence problem
# Bad: useradd myuser
# Good:
if ! id myuser &>/dev/null; then useradd myuser; fiConclusion and outlook
Key takeaways
Shell scripts are suitable for quick, low‑level tasks and as bootstrap tools.
Terraform provides declarative, versioned, and auditable infrastructure management.
A hybrid approach uses Terraform for lifecycle, Shell/Ansible for in‑instance configuration.
Future trends
GitOps integration for pull‑request‑driven changes.
Policy‑as‑code (OPA, Sentinel) for compliance.
FinOps embedding cost analysis in IaC.
AI‑assisted generation and optimization of Terraform code.
Action recommendations
Teams still using Shell scripts should start with non‑critical environments, gradually convert scripts to Terraform modules, establish code review and testing pipelines, and invest in IaC training.
Ops Community
A leading IT operations community where professionals share and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
