When a “Perfect” EKS Terraform Module Becomes a Debugging Nightmare

The author recounts the high hopes and subsequent frustrations of adopting the community‑maintained terraform‑aws‑eks module for AWS EKS, detailing hidden complexities, limited AI assistance, and practical lessons on embracing complexity, critical use of open‑source modules, and the importance of rest during tough debugging sessions.

Ops Development & AI Practice
Ops Development & AI Practice
Ops Development & AI Practice
When a “Perfect” EKS Terraform Module Becomes a Debugging Nightmare

Goal

The author aimed to create a reusable, team‑specific Terraform module for provisioning an Amazon Elastic Kubernetes Service (EKS) cluster. The intent was to leverage Terraform’s infrastructure‑as‑code capabilities while customizing the cluster to meet internal requirements such as node volume size and the AWS EBS CSI driver.

Community Module Adopted

The work started by importing the widely used terraform-aws-modules/terraform-aws-eks module from GitHub: https://github.com/terraform-aws-modules/terraform-aws-eks. The module provides a comprehensive set of variables, managed node groups, and optional add‑ons, making it a solid foundation for rapid EKS deployment.

Integration Steps

Add the module block to the Terraform configuration, for example:

module "eks" {
  source  = "terraform-aws-modules/eks/aws"
  version = "~> 19.0"

  cluster_name    = "my-eks-cluster"
  cluster_version = "1.29"
  subnets        = var.private_subnets
  vpc_id         = var.vpc_id

  # Node group defaults
  node_groups = {
    default = {
      desired_capacity = 2
      max_capacity    = 3
      min_capacity    = 1
      instance_type   = "t3.medium"
      volume_size     = var.volume_size   # custom variable
    }
  }

  # Enable the AWS EBS CSI driver add‑on
  enable_aws_ebs_csi_driver = true
}

Define any custom variables (e.g., volume_size) in variables.tf and provide values via terraform.tfvars or the CI pipeline.

Run the standard Terraform workflow:

terraform init
terraform plan
terraform apply

Customization Attempts and Observed Issues

Two primary customizations were applied:

Node volume size : The volume_size variable was set to 100 GiB, but after terraform apply the EC2 instances in the node group still displayed the default 20 GiB volume.

AWS EBS CSI driver : The enable_aws_ebs_csi_driver flag was set to true. The module reported successful creation, yet the aws-ebs-csi-driver DaemonSet was absent from the cluster when inspected with kubectl get ds -n kube-system.

Both issues manifested despite a clean Terraform plan and no error messages, indicating a mismatch between the module’s input variables and the resources actually provisioned.

Debugging Process

Inspect variable values : Used terraform console to query the effective values of var.volume_size and module.eks.node_groups["default"].volume_size. The console confirmed the variable held the expected value.

Review generated plan : Ran terraform plan -out=plan.out and examined the plan with terraform show -json plan.out. The plan showed the correct volume_size in the aws_launch_template resource, but the actual EC2 instances were still created with the default size, suggesting a later overwrite.

Read module source : Navigated to the module’s modules/node_groups directory in the repository to understand how volume_size is propagated. Discovered that the variable is overridden by a hard‑coded default when use_custom_launch_template is false.

Check add‑on activation : Verified the enable_aws_ebs_csi_driver variable path. The module creates the driver only when cluster_addons includes the driver entry. The default cluster_addons map does not contain the driver, so setting the flag alone is insufficient.

AWS console verification : Confirmed the launch template attached to the node group reflected the custom volume size, but the node group was using a different launch template version that retained the default size.

Resolution Highlights

To apply a custom volume_size, set use_custom_launch_template = true and provide a custom launch template block that explicitly defines block_device_mappings with the desired size.

Enable the EBS CSI driver by populating the cluster_addons map, e.g.:

cluster_addons = {
  aws-ebs-csi-driver = {
    most_recent = true
    service_account_role_arn = var.ebs_csi_role_arn
  }
}

or by upgrading to a module version where the flag directly provisions the driver.

Limitations of AI Assistance

Attempts to resolve the issues with GitHub Copilot Agent produced generic suggestions that did not address the module’s internal conditional logic. This illustrates that current AI tools excel at routine code generation but often lack the deep contextual reasoning required for complex, highly‑parameterized infrastructure modules.

Key Takeaways

Understand the abstraction layer : Managed modules hide implementation details; engineers must read the source to know which variables truly affect resources.

Validate module defaults and overrides : Many modules include hidden defaults (e.g., launch template handling) that can override user‑provided values.

Use explicit configuration for add‑ons : Flags alone may not be sufficient; consult the module’s documentation for the exact map structures required.

Combine tooling with manual inspection : Terraform console, plan inspection, and direct source review are essential when behavior diverges from expectations.

cloud-nativeDevOpsAWSEKSInfrastructure as CodeAI Copilot
Ops Development & AI Practice
Written by

Ops Development & AI Practice

DevSecOps engineer sharing experiences and insights on AI, Web3, and Claude code development. Aims to help solve technical challenges, improve development efficiency, and grow through community interaction. Feel free to comment and discuss.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.