Guide to Terraform Pull Request Workflows

This piece examines how modern enterprises implement Terraform pull request automation across multiple teams, with examples from real-world deployments.

The Current State of Cross-Team Terraform Automation

Organizations implementing cross-team Terraform PR automation report 40-70% reductions in infrastructure provisioning time. Deutsche Bank's platform team now enables hundreds of development teams to provision compliant infrastructure within minutes, while Spotify successfully migrated 1,200+ microservices using automated Terraform workflows.

The shift from manual infrastructure management to automated workflows has become essential for organizations managing infrastructure at scale. Teams face a critical decision: choosing between open-source solutions requiring significant maintenance effort or commercial platforms offering managed services with advanced features.

Platform Comparison

The Terraform automation landscape offers diverse solutions, each with distinct advantages:

Platform Cost (Annual) Multi-IaC Support Self-Hosted Key Strengths
Atlantis $0 (hosting only) Terraform only Yes Open source, full control
Terraform Cloud $70/user/month Terraform only Enterprise only HashiCorp native integration
Spacelift Custom pricing Yes (TF, Pulumi, CF, K8s) Yes Flexible, multi-tool support
Scalr Usage-based pricing Terraform, OpenTofu Yes OPA native, cost-focused
env0 Custom pricing Yes (multiple tools) No Template marketplace

For organizations prioritizing Terraform-specific workflows with strong governance needs, platforms like Scalr provide focused solutions. Scalr's native Open Policy Agent integration and hierarchical workspace structure particularly suit enterprises requiring consistent policy enforcement across teams. The platform's emphasis on cost management through built-in FinOps features addresses a common pain point in multi-team environments.

Implementing PR Workflows Across VCS Platforms {#implementing-workflows}

GitHub Actions Implementation

GitHub Actions has become the de facto standard for many teams. Here's a production-ready workflow incorporating best practices:

name: Terraform PR Automation
on:
  pull_request:
    paths:
      - 'terraform/**'
      - '.github/workflows/terraform.yml'

permissions:
  id-token: write
  contents: read
  pull-requests: write

jobs:
  terraform-check:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        environment: [dev, staging, prod]
    
    steps:
      - uses: actions/checkout@v4
      
      - name: Configure AWS Credentials
        uses: aws-actions/configure-aws-credentials@v4
        with:
          role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions
          aws-region: us-east-1
      
      - name: Setup Terraform
        uses: hashicorp/setup-terraform@v3
        with:
          terraform_version: 1.6.0
          terraform_wrapper: false
      
      - name: Terraform Format Check
        run: terraform fmt -check -recursive
        
      - name: Terraform Init
        working-directory: terraform/${{ matrix.environment }}
        run: |
          terraform init \
            -backend-config="bucket=${{ secrets.TF_STATE_BUCKET }}" \
            -backend-config="key=${{ matrix.environment }}/terraform.tfstate" \
            -backend-config="dynamodb_table=${{ secrets.TF_LOCK_TABLE }}"
      
      - name: Terraform Validate
        working-directory: terraform/${{ matrix.environment }}
        run: terraform validate
      
      - name: Run Security Scan
        uses: aquasecurity/tfsec-pr-commenter-action@v1
        with:
          working_directory: terraform/${{ matrix.environment }}
          github_token: ${{ github.token }}
      
      - name: Terraform Plan
        id: plan
        working-directory: terraform/${{ matrix.environment }}
        run: |
          terraform plan -out=tfplan -no-color 2>&1 | tee plan_output.txt
          echo "exitcode=$?" >> $GITHUB_OUTPUT
          
      - name: Post Plan to PR
        uses: actions/github-script@v7
        if: always()
        with:
          script: |
            const fs = require('fs');
            const planOutput = fs.readFileSync('terraform/${{ matrix.environment }}/plan_output.txt', 'utf8');
            const truncated = planOutput.length > 65000 ? 
              planOutput.substring(0, 65000) + "\n\n... Output truncated ..." : 
              planOutput;
            
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: `### Terraform Plan - ${{ matrix.environment }}
              
<details>
<summary>Click to expand</summary>

\`\`\`
${truncated}
\`\`\`

</details>`
            });

GitLab CI Implementation

GitLab's native Terraform integration simplifies state management:

stages:
  - validate
  - plan
  - apply

variables:
  TF_ROOT: ${CI_PROJECT_DIR}/terraform
  TF_STATE_NAME: ${CI_ENVIRONMENT_NAME}

.terraform-base:
  image: hashicorp/terraform:1.6
  before_script:
    - cd ${TF_ROOT}/${CI_ENVIRONMENT_NAME}
    - terraform init
      -backend-config="address=${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/terraform/state/${TF_STATE_NAME}"
      -backend-config="lock_address=${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/terraform/state/${TF_STATE_NAME}/lock"
      -backend-config="unlock_address=${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/terraform/state/${TF_STATE_NAME}/lock"
      -backend-config="username=gitlab-ci-token"
      -backend-config="password=${CI_JOB_TOKEN}"
      -backend-config="lock_method=POST"
      -backend-config="unlock_method=DELETE"

validate:
  extends: .terraform-base
  stage: validate
  script:
    - terraform fmt -check -recursive
    - terraform validate
  rules:
    - if: $CI_MERGE_REQUEST_ID

plan:dev:
  extends: .terraform-base
  stage: plan
  environment:
    name: development
  script:
    - terraform plan -out=tfplan
    - terraform show -json tfplan > plan.json
  artifacts:
    paths:
      - ${TF_ROOT}/${CI_ENVIRONMENT_NAME}/tfplan
      - ${TF_ROOT}/${CI_ENVIRONMENT_NAME}/plan.json
    reports:
      terraform: ${TF_ROOT}/${CI_ENVIRONMENT_NAME}/plan.json
  rules:
    - if: $CI_MERGE_REQUEST_ID

apply:dev:
  extends: .terraform-base
  stage: apply
  environment:
    name: development
  script:
    - terraform apply tfplan
  dependencies:
    - plan:dev
  rules:
    - if: $CI_COMMIT_BRANCH == "main"
    - when: manual

Multi-Team Governance Patterns

Successful multi-team implementations follow consistent patterns. Here's an OPA policy example that enforces common governance requirements:

package terraform.scalr

import future.keywords.contains
import future.keywords.if

# Deny resources without required tags
deny contains msg if {
    resource := input.planned_values.root_module.resources[_]
    resource.type != "random_id"
    required_tags := {"Environment", "Team", "CostCenter", "Owner"}
    missing_tags := required_tags - {tag | tag := resource.values.tags[_]}
    count(missing_tags) > 0
    msg := sprintf("Resource %s is missing required tags: %v", 
        [resource.address, missing_tags])
}

# Deny expensive instance types without approval
deny contains msg if {
    resource := input.planned_values.root_module.resources[_]
    resource.type == "aws_instance"
    expensive_types := {
        "m5.24xlarge", "m5.metal", "c5.24xlarge", 
        "r5.24xlarge", "x1e.32xlarge"
    }
    resource.values.instance_type in expensive_types
    not input.scalr.approve_expensive_resources
    msg := sprintf("Instance type %s requires approval. Resource: %s", 
        [resource.values.instance_type, resource.address])
}

# Enforce naming conventions
deny contains msg if {
    resource := input.planned_values.root_module.resources[_]
    not regex.match("^[a-z]+(-[a-z]+)*$", resource.name)
    msg := sprintf("Resource %s does not follow naming convention (lowercase-hyphenated)", 
        [resource.address])
}

# Cost control - deny if monthly cost exceeds threshold
deny contains msg if {
    cost_estimate := to_number(input.scalr.cost_estimate.proposed_monthly_cost)
    threshold := 5000
    cost_estimate > threshold
    not input.scalr.approve_high_cost
    msg := sprintf("Estimated monthly cost $%.2f exceeds threshold of $%.2f", 
        [cost_estimate, threshold])
}

Platforms with native OPA support like Scalr simplify policy deployment across teams. Organizations can maintain centralized policy libraries while allowing teams to extend with specific requirements.

Managing Dependencies and State

Cross-team dependencies require careful architectural decisions. Here's a pattern using data sources to avoid tight coupling:

# Platform team publishes core infrastructure
# modules/platform/networking/outputs.tf
output "vpc_id" {
  value = aws_vpc.main.id
}

output "private_subnet_ids" {
  value = aws_subnet.private[*].id
}

# Tag resources for discovery
resource "aws_vpc" "main" {
  cidr_block = var.vpc_cidr
  
  tags = {
    Name        = "platform-vpc-${var.environment}"
    Environment = var.environment
    Team        = "platform"
    Purpose     = "shared-infrastructure"
  }
}

# Application team discovers resources
# applications/web-app/data.tf
data "aws_vpc" "platform" {
  tags = {
    Team        = "platform"
    Environment = var.environment
    Purpose     = "shared-infrastructure"
  }
}

data "aws_subnets" "private" {
  filter {
    name   = "vpc-id"
    values = [data.aws_vpc.platform.id]
  }
  
  tags = {
    Tier = "private"
  }
}

# Use discovered resources
resource "aws_instance" "app" {
  subnet_id = data.aws_subnets.private.ids[0]
  # ... other configuration
}

For unavoidable state dependencies, implement read-only patterns:

# Platform team exposes minimal outputs
# terraform/platform/outputs.tf
output "vpc_config" {
  value = {
    vpc_id             = aws_vpc.main.id
    private_subnet_ids = aws_subnet.private[*].id
  }
  description = "Core VPC configuration for application teams"
}

# Application team consumes via remote state
# terraform/apps/web/main.tf
data "terraform_remote_state" "platform" {
  backend = "s3"
  config = {
    bucket = "company-terraform-state"
    key    = "platform/${var.environment}/terraform.tfstate"
    region = "us-east-1"
    
    # Read-only access via assume role
    role_arn = "arn:aws:iam::${var.platform_account_id}:role/terraform-state-reader"
  }
}

locals {
  vpc_id = data.terraform_remote_state.platform.outputs.vpc_config.vpc_id
}

Security Considerations

Security must be built into every layer of the automation. Here's a comprehensive security configuration:

# backend.tf - Encrypted state with access logging
terraform {
  backend "s3" {
    bucket         = "company-terraform-state"
    key            = "workspaces/prod/terraform.tfstate"
    region         = "us-east-1"
    encrypt        = true
    kms_key_id     = "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
    dynamodb_table = "terraform-state-lock"
    
    # Enable access logging
    access_logging {
      target_bucket = "company-terraform-logs"
      target_prefix = "state-access/"
    }
  }
}

# provider.tf - Dynamic credentials with assume role
provider "aws" {
  region = var.aws_region
  
  assume_role {
    role_arn     = "arn:aws:iam::${var.target_account_id}:role/TerraformExecutionRole"
    session_name = "terraform-${var.environment}-${timestamp()}"
    
    # Limit session duration
    duration_seconds = 3600
    
    # Require MFA for production
    dynamic "external_id" {
      for_each = var.environment == "prod" ? [1] : []
      content {
        external_id = var.mfa_token
      }
    }
  }
  
  default_tags {
    tags = {
      ManagedBy   = "Terraform"
      Environment = var.environment
      LastUpdated = timestamp()
    }
  }
}

Common Problems and Solutions

Organizations consistently encounter similar challenges. Here are field-tested solutions:

State Lock Conflicts

Implement queuing mechanisms to prevent concurrent modifications:

#!/bin/bash
# terraform-wrapper.sh - Prevents concurrent runs

LOCK_FILE="/tmp/terraform-${WORKSPACE}.lock"
TIMEOUT=300

acquire_lock() {
    local count=0
    while [ -f "$LOCK_FILE" ]; do
        if [ $count -gt $TIMEOUT ]; then
            echo "ERROR: Timeout waiting for lock release"
            exit 1
        fi
        echo "Waiting for existing Terraform operation to complete..."
        sleep 5
        ((count+=5))
    done
    
    echo $$ > "$LOCK_FILE"
}

release_lock() {
    rm -f "$LOCK_FILE"
}

trap release_lock EXIT

acquire_lock
terraform "$@"

Large State Files

Split monolithic configurations into focused modules:

# Avoid: Single state file with 500+ resources
# Better: Split by service boundary

# terraform/networking/main.tf
module "vpc" {
  source = "../../modules/vpc"
  # VPC-specific configuration
}

# terraform/compute/main.tf
module "eks_cluster" {
  source = "../../modules/eks"
  vpc_id = data.aws_vpc.main.id
  # EKS-specific configuration
}

# terraform/data/main.tf  
module "rds_cluster" {
  source = "../../modules/rds"
  vpc_id = data.aws_vpc.main.id
  # RDS-specific configuration
}

Building Your Implementation Roadmap

Successful implementations follow a phased approach:

Phase 1: Foundation (Weeks 1-4)

  • Implement remote state with encryption
  • Set up basic CI/CD integration
  • Create workspace naming conventions
  • Deploy initial policy framework

Phase 2: Automation (Weeks 5-8)

  • Enable PR automation workflows
  • Implement approval processes
  • Add security scanning
  • Create module templates

Phase 3: Scale (Weeks 9-12)

  • Onboard additional teams
  • Implement advanced policies
  • Add cost management controls
  • Enable cross-team dependencies

Phase 4: Optimization (Ongoing)

  • Performance tuning
  • Advanced automation features
  • Integration with enterprise tools
  • Continuous improvement

For organizations evaluating platforms, consider these key differentiators:

  • Open source (Atlantis): Maximum flexibility, requires significant operational investment
  • HashiCorp-native (Terraform Cloud): Deepest Terraform integration, limited to single tool
  • Multi-IaC platforms (Spacelift, env0): Flexibility for diverse environments
  • Governance-focused (Scalr): Native OPA integration, hierarchical management, built-in FinOps

The choice ultimately depends on your organization's specific needs, existing toolchain, and governance requirements. Platforms emphasizing native policy integration and cost management often provide faster time-to-value for enterprises prioritizing governance and financial control.

Summary

Cross-team Terraform automation has evolved from experimental practice to enterprise necessity. Success requires choosing the right platform, implementing robust governance, and following proven patterns from organizations that have scaled successfully. Whether using open-source Atlantis or commercial platforms, the key lies in starting simple, automating incrementally, and maintaining focus on security and governance throughout the journey.