Terraform Drift: a Comprehensive Guide

When AWS infrastructure diverges from your Terraform code, you're facing drift – and it's likely causing you headaches right now. The most effective approach combines proactive detection through regular terraform plan runs, strategic remediation using terraform import and state commands, and organizational guardrails like approval workflows and automated testing. Teams successfully managing drift typically implement a combination of technical tools like driftctl or Terraform Cloud with strict governance policies and CI/CD pipeline integration.

The drift detection toolkit: commands and interpretation

Detecting and understanding drift is your crucial first step toward infrastructure stability. Regular scans using terraform's native tools should be your foundation.

The primary command for drift detection is terraform plan, which compares your Terraform state with the actual infrastructure:

terraform plan -detailed-exitcode
# Returns:
# 0 - No changes
# 1 - Error
# 2 - Changes present (drift detected)

For thorough drift analysis, use these specialized commands:

# Refresh state without making changes
terraform refresh

# Show current state with details
terraform state list
terraform state show aws_instance.example

# Identify specific drifted resources
terraform plan -target=aws_instance.example

When interpreting plan output, focus on three key indicators:

  1. Resources marked for destruction and recreation (red text with #) indicate severe drift requiring immediate attention
  2. In-place updates (yellow text with ~) suggest moderate drift that can be remediated safely
  3. Additions (green text with +) may indicate external resources created outside Terraform

Third-party tools enhance detection capabilities beyond Terraform's native functionality:

  • Terraform Cloud/Enterprise: Offers drift detection as a service with notifications
  • Cloudrail: Provides security-focused drift detection, highlighting compliance issues

driftctl: Specialized in detecting drift with detailed reports

driftctl scan --from tfstate+s3://my-bucket/terraform.tfstate

Pro tip: The -json flag with terraform plan outputs machine-readable results for programmatic analysis, enabling automated drift categorization.

Strategic drift remediation: from emergency fixes to controlled recovery

When addressing drift, prioritize based on both business impact and remediation risk. Security-related drift (modified IAM policies, open security groups) demands immediate attention.

For emergency remediation of critical resources without disruption:

Validate the changes would be non-destructive:

terraform plan -target=aws_instance.example

Carefully update your Terraform configuration to match reality:

resource "aws_instance" "example" {
  # Update attributes to match current state
  instance_type = "t3.medium"  # Changed from t3.small
  # Other attributes...
}

First, document the current state:

aws ec2 describe-instances --instance-ids i-1234567890abcdef0 > current_state.json

For resources created outside Terraform, the import command is your best tool:

# Import existing resource into Terraform state
terraform import aws_s3_bucket.data bucket-name

# For complex resources, use import blocks (Terraform 1.5+)
import {
  to = aws_instance.web
  id = "i-1234567890abcdef0"
}

When drift is extensive, the state surgery approach offers more control:

Push the modified state:

terraform state push terraform.tfstate

Edit state manually or with terraform state commands:

terraform state rm aws_instance.problematic

Download current state:

terraform state pull > terraform.tfstate

Caution: Always create a state backup before making changes:

terraform state pull > terraform.tfstate.backup

Organizational drift prevention: building guardrails

Organizations that successfully manage drift implement structured workflows with clear responsibilities. A proven model includes:

  1. Infrastructure request workflow requiring all changes to go through Terraform code
  2. Approval gates preventing direct AWS console access for production
  3. Regular drift detection as part of the CI/CD pipeline

Documentation is critical for drift control. Maintain a central knowledge base covering:

  • Approved processes for emergency changes
  • Terraform module usage guidelines
  • Resource tagging standards for tracking ownership

Teams should establish a clear drift prioritization framework based on:

Drift Type Priority Example Approach
Security-critical P0 Modified security groups, IAM policies Immediate remediation
Business-critical P1 Changes to production databases, load balancers Scheduled remediation
Configuration drift P2 Instance type changes, tag modifications Batch remediation
Informational P3 Comment changes, cosmetic differences Document for next regular update

For effective team organization, assign specific drift-related roles:

  • Infrastructure guardians: Review and approve all infrastructure changes
  • Drift detectors: Run regular scans and triage findings
  • Remediation specialists: Fix drift with minimal disruption

Automating drift management at scale

For large AWS environments, manual detection becomes impractical. Implementing automated drift detection in CI/CD pipelines ensures consistent monitoring:

# Example GitHub Actions workflow for drift detection
name: Terraform Drift Detection
on:
  schedule:
    - cron: '0 8 * * *'  # Daily at 8 AM
jobs:
  detect_drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v2
      - name: Terraform Init
        run: terraform init
      - name: Check Drift
        run: |
          terraform plan -detailed-exitcode
          if [ $? -eq 2 ]; then
            echo "Drift detected!"
            # Send notification to Slack/email
          fi

For multi-account AWS environments, consider these scaling strategies:

  • Account segmentation with dedicated Terraform workspaces
  • Centralized drift reporting aggregating findings across accounts
  • Automated remediation pipelines for low-risk drift

Tools that excel in multi-account scenarios include:

  • Terraform Cloud with its workspaces and cross-account detection
  • AWS Config with custom rules for Terraform compliance
  • Atlantis for pull request automation and drift detection

AWS Organizations integration enables policy-based prevention:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Deny",
      "Action": [
        "ec2:ModifyInstanceAttribute",
        "rds:ModifyDBInstance"
      ],
      "Resource": "*",
      "Condition": {
        "StringEquals": {"aws:ResourceTag/ManagedBy": "Terraform"}
      }
    }
  ]
}

Common drift scenarios and targeted solutions

Understanding typical drift patterns helps develop targeted prevention strategies:

  1. Console cowboys: When team members make emergency changes via the AWS console
    • Solution: Implement read-only console access with break-glass procedures
    • Recovery: Regular terraform import operations with automated detection
  2. AWS automated modifications: When AWS modifies resources automatically
    • Example: Auto Scaling Group instance replacements
  3. Partial applies: When terraform apply operations fail midway
    • Solution: Use -target carefully and implement state locking
    • Recovery: terraform apply with -refresh-only to synchronize state
  4. External integrations: When other systems modify AWS resources
    • Solution: Tag resources with ownership information and establish integration contracts
    • Detection: Custom filtering in drift reports for expected changes

Solution: Use lifecycle blocks to ignore changes:

lifecycle {  ignore_changes = [instance_type]}

Conclusion

Effective Terraform drift management combines technical discipline with organizational guardrails. By implementing regular detection through both native Terraform commands and specialized tools like driftctl, you can stay ahead of infrastructure divergence. Successful teams pair these technical approaches with strong governance policies, clear prioritization frameworks, and automation to enable scaling. Remember that preventing drift is always more efficient than remediating it – invest in workflows and team training to minimize its occurrence.