Skip to content

Instantly share code, notes, and snippets.

@siddharthkrish
Last active July 24, 2024 09:09
Show Gist options
  • Save siddharthkrish/c9ad185df5df29c8084e9bfd2b6214b9 to your computer and use it in GitHub Desktop.
Save siddharthkrish/c9ad185df5df29c8084e9bfd2b6214b9 to your computer and use it in GitHub Desktop.
Checklist: Minimizing Human Errors in Terraform Update on the Cloud

Checklist: Minimizing Human Errors in Terraform Update on AWS

Before Making Changes

  1. Use version control (e.g., Git) for all Terraform configurations
  2. Create a new branch for proposed changes
  3. Ensure you're working with the latest version of the main branch
  4. Update Terraform and provider versions to the latest stable releases

Writing and Reviewing Code

  1. Follow established naming conventions and code structure
  2. Use consistent formatting (run terraform fmt)
  3. Implement and update automated tests for your Terraform code
  4. Conduct peer code reviews before merging changes

Pre-Deployment Checks

  1. Run terraform validate to check for configuration errors
  2. Execute terraform plan and carefully review the proposed changes
  3. Save the plan output to a file for later application
  4. Have a team member review the plan output (explore automation for this review, potentially a LLM)
  5. Ensure changes align with the intended modifications
  6. Verify that critical resources are not being unintentionally modified or destroyed (Keep a list of critical resources)

Deployment Process

  1. Use Terraform workspaces to manage different environments (dev, staging, prod)
  2. Apply changes to lower environments (e.g., dev, staging) before production
  3. Implement and respect change freeze periods for critical environments
  4. Use terraform apply with the saved plan file to ensure consistency
  5. Monitor the apply process closely for any unexpected behavior

Post-Deployment

  1. Verify that the changes were applied correctly in the AWS console
  2. Run relevant integration or smoke tests (Add tests for all previous failures)
  3. Monitor affected systems and applications for any issues
  4. Document the changes and update relevant documentation (Have a Correct of Errors process in place)

Ongoing Practices

  1. Regularly audit and clean up unused resources (Have usage dashboards for all systems at a tier level)
  2. Implement and maintain a disaster recovery plan
  3. Conduct regular training sessions on Terraform best practices
  4. Use Terraform Cloud or a CI/CD pipeline for automated checks and applies
  5. Implement strong IAM policies and use least privilege access
  6. Regularly review and update your Terraform modules and configurations
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment