Operations 31 min read

From Shell Scripts to Terraform: Mastering Infrastructure as Code

This article traces the evolution of infrastructure automation from ad‑hoc Shell scripts through configuration‑management tools to modern declarative Terraform, highlighting common pitfalls, offering concrete best‑practice recommendations, detailed code samples, and a real‑world migration case study that demonstrates how to transition safely and efficiently.

Ops Community
Ops Community
Ops Community
From Shell Scripts to Terraform: Mastering Infrastructure as Code

Introduction

In the traditional operations era, engineers manually logged into servers to run commands, edit configuration files and restart services, which became inefficient and error‑prone as the business grew. Infrastructure as Code (IaC) emerged to version‑control, automate and continuously iterate infrastructure.

Evolution of IaC

Shell script era

Shell scripts were the first tool for automation, allowing quick implementation of tasks such as batch deployment, scheduled backups and system monitoring.

Idempotence hard to guarantee

Missing state management

Poor cross‑platform compatibility

Complex error handling

Low maintainability

Configuration‑management tools

Tools like Ansible, Puppet and Chef introduced higher‑level abstractions, idempotent operations and basic state tracking, but they focus on configuration rather than full lifecycle management of cloud resources.

Terraform era

Terraform provides a declarative language (HCL), provider ecosystem for all major clouds, robust state management, dependency handling and plan preview, enabling “code is infrastructure”.

Core comparison: Shell → Terraform

Shell script example

#!/bin/bash
# Traditional LAMP deployment script
set -e
APP_USER="webapp"
APP_DIR="/var/www/myapp"
NGINX_CONF="/etc/nginx/sites-available/myapp"
DB_NAME="myapp_db"
DB_USER="myapp_user"
DB_PASS="$(openssl rand -base64 32)"
echo "=== Starting deployment ==="
# ... (rest of script omitted for brevity)

Advantages: intuitive, quick, flexible. Drawbacks: idempotence issues, no state tracking, difficult rollback, chaotic change management.

Improved idempotent Shell script

#!/bin/bash
# Improved idempotent script
set -euo pipefail
ensure_package_installed() { ... }
ensure_user_exists() { ... }
# Main flow
echo "=== Idempotent deployment start ==="
# ... (rest omitted)

Terraform best practice

Project structure

terraform-project/
├── main.tf
├── variables.tf
├── outputs.tf
├── terraform.tfvars
├── versions.tf
├── backend.tf
├── modules/
│   ├── networking/
│   └── compute/
└── environments/
    ├── dev/
    ├── staging/
    └── prod/

versions.tf (provider version)

terraform {
  required_version = ">= 1.0"
  required_providers {
    aws = {
      source  = "hashicorp/aws"
      version = "~> 5.0"
    }
  }
  backend "s3" {
    bucket         = "mycompany-terraform-states"
    key            = "webapp/terraform.tfstate"
    region         = "us-west-2"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
  }
}
provider "aws" {
  region = var.aws_region
  default_tags {
    tags = {
      Environment = var.environment
      ManagedBy   = "Terraform"
      Project     = var.project_name
    }
  }
}

variables.tf

variable "aws_region" { description = "AWS region" type = string default = "us-west-2" }
variable "environment" { description = "Environment name" type = string validation { condition = contains(["dev","staging","prod"], var.environment) error_message = "Environment must be dev, staging or prod" } }
variable "project_name" { description = "Project name" type = string default = "myapp" }
variable "vpc_cidr" { description = "VPC CIDR block" type = string default = "10.0.0.0/16" }
variable "availability_zones" { description = "AZ list" type = list(string) default = ["us-west-2a","us-west-2b"] }
variable "instance_type" { description = "EC2 instance type" type = string default = "t3.medium" }
variable "app_instance_count" { description = "Number of app servers" type = number default = 2 }
variable "db_instance_class" { description = "RDS instance class" type = string default = "db.t3.medium" }
variable "db_password" { description = "Database password" type = string sensitive = true }

main.tf (core resources)

# VPC
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags = { Name = "${var.project_name}-${var.environment}-vpc" }
}
# Public subnet
resource "aws_subnet" "public" {
  count                   = length(var.availability_zones)
  vpc_id                  = aws_vpc.main.id
  cidr_block              = cidrsubnet(var.vpc_cidr, 8, count.index)
  availability_zone       = var.availability_zones[count.index]
  map_public_ip_on_launch = true
  tags = { Name = "${var.project_name}-${var.environment}-public-${count.index + 1}", Type = "public" }
}
# Private subnet, IGW, NAT, route tables, security groups, RDS, ALB, etc.
# (full definitions omitted for brevity)

user-data.sh (EC2 init script)

#!/bin/bash
set -e
yum update -y
yum install -y docker amazon-cloudwatch-agent
systemctl start docker
systemctl enable docker
usermod -aG docker ec2-user
cat > /etc/environment <<EOF
DB_ENDPOINT=${db_endpoint}
DB_NAME=${db_name}
DB_USERNAME=${db_username}
ENVIRONMENT=${environment}
EOF
docker run -d --name myapp --restart always -p 8080:8080 \
  -e DB_ENDPOINT=${db_endpoint} -e DB_NAME=${db_name} \
  -e DB_USERNAME=${db_username} -e ENVIRONMENT=${environment} \
  mycompany/myapp:latest
# CloudWatch config omitted
echo "Instance init complete"

outputs.tf

output "vpc_id" { description = "VPC ID" value = aws_vpc.main.id }
output "alb_dns_name" { description = "Load balancer DNS" value = aws_lb.main.dns_name }
output "rds_endpoint" { description = "RDS endpoint" value = aws_db_instance.main.endpoint sensitive = true }

Common Terraform commands

terraform init
terraform init -upgrade
terraform validate
terraform fmt -recursive
terraform plan
terraform plan -out=tfplan
terraform apply
terraform apply -auto-approve
terraform destroy
terraform output
terraform graph | dot -Tpng > graph.png
terraform force-unlock <LOCK_ID>

Migration case study

Background

A company had 10 web servers, 3 database servers, 2 Redis caches and an Nginx load balancer managed by ad‑hoc Shell scripts.

Pre‑migration Shell script

#!/bin/bash
SERVERS=("web01.example.com" "web02.example.com" "web03.example.com")
SSH_KEY="/root/.ssh/deploy_key"
APP_VERSION="v2.3.1"
# Deploy function, status collection, command runner, etc.
# (full script omitted)

Post‑migration Terraform + Ansible hybrid

Terraform defines the infrastructure (VPC, subnets, security groups, RDS, ALB, Auto Scaling). Ansible or user‑data scripts handle in‑instance configuration.

deployment.sh (Terraform helper script)

#!/bin/bash
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
TERRAFORM_DIR="${SCRIPT_DIR}/terraform"
ENVIRONMENT="${1:-dev}"
# Functions: check_prerequisites, init_terraform, validate_config, plan_changes, apply_changes, show_status, rolling_update, scale_instances, main
# (implementation omitted for brevity)
main "$@"

Comparison table

Dimension

Shell script

Terraform

Deployment time

30‑45 min (serial)

10‑15 min (parallel)

Error rate

≈15 % (manual)

<1 % (automated)

Rollback

Difficult, manual

Simple, terraform apply previous version

State tracking

None

Full state file

Change visibility

Low

High via terraform plan

Collaboration

Low

High with code review

Best practices and pitfalls

Shell script best practices

#!/bin/bash
set -euo pipefail
readonly SCRIPT_NAME=$(basename "$0")
readonly SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
readonly SCRIPT_VERSION="1.0.0"
log() { echo "[${$(date +'%Y-%m-%d %H:%M:%S')}]") "$@" | tee -a "${LOG_FILE:-/var/log/deploy.log}"; }
error() { echo "[ERROR] $*" >&2; exit 1; }
trap cleanup EXIT INT TERM
# Argument validation, idempotent package checks, safe_execute, main flow, etc.

Terraform best practices

Project structure

# Recommended enterprise Terraform layout
tterraform-infrastructure/
├── global/
│   ├── s3/
│   └── iam/
├── modules/
│   ├── vpc/
│   └── compute/
├── environments/
│   ├── dev/
│   ├── staging/
│   └── prod/
└── scripts/

Remote state management

terraform {
  backend "s3" {
    bucket         = "mycompany-terraform-states"
    key            = "environments/prod/terraform.tfstate"
    region         = "us-west-2"
    encrypt        = true
    dynamodb_table = "terraform-state-lock"
  }
}
resource "aws_s3_bucket_server_side_encryption_configuration" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  rule {
    apply_server_side_encryption_by_default { sse_algorithm = "AES256" }
  }
}
resource "aws_s3_bucket_versioning" "terraform_state" {
  bucket = aws_s3_bucket.terraform_state.id
  versioning_configuration { status = "Enabled" }
}

Modular design example (VPC module)

variable "vpc_cidr" { type = string }
variable "environment" { type = string }
variable "availability_zones" { type = list(string) }
resource "aws_vpc" "main" {
  cidr_block           = var.vpc_cidr
  enable_dns_hostnames = true
  enable_dns_support   = true
  tags = { Name = "${var.environment}-vpc" }
}
output "vpc_id" { value = aws_vpc.main.id }

Sensitive data handling

# Store password in AWS Secrets Manager
aws secretsmanager create-secret --name prod/db/password --secret-string "$(openssl rand -base64 32)"
# Reference in Terraform
data "aws_secretsmanager_secret_version" "db_password" { secret_id = "prod/db/password" }
resource "aws_db_instance" "main" {
  # ...
  password = data.aws_secretsmanager_secret_version.db_password.secret_string
}

Common errors and solutions

State lock conflict

# Error: Error acquiring the state lock
ps aux | grep terraform
terraform force-unlock <LOCK_ID>
# Use CI/CD to serialize runs

Resource dependency issue

# Incorrect order
resource "aws_instance" "app" { vpc_security_group_ids = [aws_security_group.app.id] }
resource "aws_security_group" "app" { }
# Fix with explicit depends_on
resource "aws_instance" "app" { depends_on = [aws_security_group.app] }

Shell idempotence problem

# Bad: useradd myuser
# Good:
if ! id myuser &>/dev/null; then useradd myuser; fi

Conclusion and outlook

Key takeaways

Shell scripts are suitable for quick, low‑level tasks and as bootstrap tools.

Terraform provides declarative, versioned, and auditable infrastructure management.

A hybrid approach uses Terraform for lifecycle, Shell/Ansible for in‑instance configuration.

Future trends

GitOps integration for pull‑request‑driven changes.

Policy‑as‑code (OPA, Sentinel) for compliance.

FinOps embedding cost analysis in IaC.

AI‑assisted generation and optimization of Terraform code.

Action recommendations

Teams still using Shell scripts should start with non‑critical environments, gradually convert scripts to Terraform modules, establish code review and testing pipelines, and invest in IaC training.

DevOpsAWSiacTerraformshell scripting
Ops Community
Written by

Ops Community

A leading IT operations community where professionals share and grow together.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.