Guide to Terraform Pull Request Workflows
This piece examines how modern enterprises implement Terraform pull request automation across multiple teams, with examples from real-world deployments.
The Current State of Cross-Team Terraform Automation
Organizations implementing cross-team Terraform PR automation report 40-70% reductions in infrastructure provisioning time. Deutsche Bank's platform team now enables hundreds of development teams to provision compliant infrastructure within minutes, while Spotify successfully migrated 1,200+ microservices using automated Terraform workflows.
The shift from manual infrastructure management to automated workflows has become essential for organizations managing infrastructure at scale. Teams face a critical decision: choosing between open-source solutions requiring significant maintenance effort or commercial platforms offering managed services with advanced features.
Platform Comparison
The Terraform automation landscape offers diverse solutions, each with distinct advantages:
Platform | Cost (Annual) | Multi-IaC Support | Self-Hosted | Key Strengths |
---|---|---|---|---|
Atlantis | $0 (hosting only) | Terraform only | Yes | Open source, full control |
Terraform Cloud | $70/user/month | Terraform only | Enterprise only | HashiCorp native integration |
Spacelift | Custom pricing | Yes (TF, Pulumi, CF, K8s) | Yes | Flexible, multi-tool support |
Scalr | Usage-based pricing | Terraform, OpenTofu | Yes | OPA native, cost-focused |
env0 | Custom pricing | Yes (multiple tools) | No | Template marketplace |
For organizations prioritizing Terraform-specific workflows with strong governance needs, platforms like Scalr provide focused solutions. Scalr's native Open Policy Agent integration and hierarchical workspace structure particularly suit enterprises requiring consistent policy enforcement across teams. The platform's emphasis on cost management through built-in FinOps features addresses a common pain point in multi-team environments.
Implementing PR Workflows Across VCS Platforms {#implementing-workflows}
GitHub Actions Implementation
GitHub Actions has become the de facto standard for many teams. Here's a production-ready workflow incorporating best practices:
name: Terraform PR Automation
on:
pull_request:
paths:
- 'terraform/**'
- '.github/workflows/terraform.yml'
permissions:
id-token: write
contents: read
pull-requests: write
jobs:
terraform-check:
runs-on: ubuntu-latest
strategy:
matrix:
environment: [dev, staging, prod]
steps:
- uses: actions/checkout@v4
- name: Configure AWS Credentials
uses: aws-actions/configure-aws-credentials@v4
with:
role-to-assume: arn:aws:iam::${{ secrets.AWS_ACCOUNT_ID }}:role/github-actions
aws-region: us-east-1
- name: Setup Terraform
uses: hashicorp/setup-terraform@v3
with:
terraform_version: 1.6.0
terraform_wrapper: false
- name: Terraform Format Check
run: terraform fmt -check -recursive
- name: Terraform Init
working-directory: terraform/${{ matrix.environment }}
run: |
terraform init \
-backend-config="bucket=${{ secrets.TF_STATE_BUCKET }}" \
-backend-config="key=${{ matrix.environment }}/terraform.tfstate" \
-backend-config="dynamodb_table=${{ secrets.TF_LOCK_TABLE }}"
- name: Terraform Validate
working-directory: terraform/${{ matrix.environment }}
run: terraform validate
- name: Run Security Scan
uses: aquasecurity/tfsec-pr-commenter-action@v1
with:
working_directory: terraform/${{ matrix.environment }}
github_token: ${{ github.token }}
- name: Terraform Plan
id: plan
working-directory: terraform/${{ matrix.environment }}
run: |
terraform plan -out=tfplan -no-color 2>&1 | tee plan_output.txt
echo "exitcode=$?" >> $GITHUB_OUTPUT
- name: Post Plan to PR
uses: actions/github-script@v7
if: always()
with:
script: |
const fs = require('fs');
const planOutput = fs.readFileSync('terraform/${{ matrix.environment }}/plan_output.txt', 'utf8');
const truncated = planOutput.length > 65000 ?
planOutput.substring(0, 65000) + "\n\n... Output truncated ..." :
planOutput;
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: `### Terraform Plan - ${{ matrix.environment }}
<details>
<summary>Click to expand</summary>
\`\`\`
${truncated}
\`\`\`
</details>`
});
GitLab CI Implementation
GitLab's native Terraform integration simplifies state management:
stages:
- validate
- plan
- apply
variables:
TF_ROOT: ${CI_PROJECT_DIR}/terraform
TF_STATE_NAME: ${CI_ENVIRONMENT_NAME}
.terraform-base:
image: hashicorp/terraform:1.6
before_script:
- cd ${TF_ROOT}/${CI_ENVIRONMENT_NAME}
- terraform init
-backend-config="address=${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/terraform/state/${TF_STATE_NAME}"
-backend-config="lock_address=${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/terraform/state/${TF_STATE_NAME}/lock"
-backend-config="unlock_address=${CI_API_V4_URL}/projects/${CI_PROJECT_ID}/terraform/state/${TF_STATE_NAME}/lock"
-backend-config="username=gitlab-ci-token"
-backend-config="password=${CI_JOB_TOKEN}"
-backend-config="lock_method=POST"
-backend-config="unlock_method=DELETE"
validate:
extends: .terraform-base
stage: validate
script:
- terraform fmt -check -recursive
- terraform validate
rules:
- if: $CI_MERGE_REQUEST_ID
plan:dev:
extends: .terraform-base
stage: plan
environment:
name: development
script:
- terraform plan -out=tfplan
- terraform show -json tfplan > plan.json
artifacts:
paths:
- ${TF_ROOT}/${CI_ENVIRONMENT_NAME}/tfplan
- ${TF_ROOT}/${CI_ENVIRONMENT_NAME}/plan.json
reports:
terraform: ${TF_ROOT}/${CI_ENVIRONMENT_NAME}/plan.json
rules:
- if: $CI_MERGE_REQUEST_ID
apply:dev:
extends: .terraform-base
stage: apply
environment:
name: development
script:
- terraform apply tfplan
dependencies:
- plan:dev
rules:
- if: $CI_COMMIT_BRANCH == "main"
- when: manual
Multi-Team Governance Patterns
Successful multi-team implementations follow consistent patterns. Here's an OPA policy example that enforces common governance requirements:
package terraform.scalr
import future.keywords.contains
import future.keywords.if
# Deny resources without required tags
deny contains msg if {
resource := input.planned_values.root_module.resources[_]
resource.type != "random_id"
required_tags := {"Environment", "Team", "CostCenter", "Owner"}
missing_tags := required_tags - {tag | tag := resource.values.tags[_]}
count(missing_tags) > 0
msg := sprintf("Resource %s is missing required tags: %v",
[resource.address, missing_tags])
}
# Deny expensive instance types without approval
deny contains msg if {
resource := input.planned_values.root_module.resources[_]
resource.type == "aws_instance"
expensive_types := {
"m5.24xlarge", "m5.metal", "c5.24xlarge",
"r5.24xlarge", "x1e.32xlarge"
}
resource.values.instance_type in expensive_types
not input.scalr.approve_expensive_resources
msg := sprintf("Instance type %s requires approval. Resource: %s",
[resource.values.instance_type, resource.address])
}
# Enforce naming conventions
deny contains msg if {
resource := input.planned_values.root_module.resources[_]
not regex.match("^[a-z]+(-[a-z]+)*$", resource.name)
msg := sprintf("Resource %s does not follow naming convention (lowercase-hyphenated)",
[resource.address])
}
# Cost control - deny if monthly cost exceeds threshold
deny contains msg if {
cost_estimate := to_number(input.scalr.cost_estimate.proposed_monthly_cost)
threshold := 5000
cost_estimate > threshold
not input.scalr.approve_high_cost
msg := sprintf("Estimated monthly cost $%.2f exceeds threshold of $%.2f",
[cost_estimate, threshold])
}
Platforms with native OPA support like Scalr simplify policy deployment across teams. Organizations can maintain centralized policy libraries while allowing teams to extend with specific requirements.
Managing Dependencies and State
Cross-team dependencies require careful architectural decisions. Here's a pattern using data sources to avoid tight coupling:
# Platform team publishes core infrastructure
# modules/platform/networking/outputs.tf
output "vpc_id" {
value = aws_vpc.main.id
}
output "private_subnet_ids" {
value = aws_subnet.private[*].id
}
# Tag resources for discovery
resource "aws_vpc" "main" {
cidr_block = var.vpc_cidr
tags = {
Name = "platform-vpc-${var.environment}"
Environment = var.environment
Team = "platform"
Purpose = "shared-infrastructure"
}
}
# Application team discovers resources
# applications/web-app/data.tf
data "aws_vpc" "platform" {
tags = {
Team = "platform"
Environment = var.environment
Purpose = "shared-infrastructure"
}
}
data "aws_subnets" "private" {
filter {
name = "vpc-id"
values = [data.aws_vpc.platform.id]
}
tags = {
Tier = "private"
}
}
# Use discovered resources
resource "aws_instance" "app" {
subnet_id = data.aws_subnets.private.ids[0]
# ... other configuration
}
For unavoidable state dependencies, implement read-only patterns:
# Platform team exposes minimal outputs
# terraform/platform/outputs.tf
output "vpc_config" {
value = {
vpc_id = aws_vpc.main.id
private_subnet_ids = aws_subnet.private[*].id
}
description = "Core VPC configuration for application teams"
}
# Application team consumes via remote state
# terraform/apps/web/main.tf
data "terraform_remote_state" "platform" {
backend = "s3"
config = {
bucket = "company-terraform-state"
key = "platform/${var.environment}/terraform.tfstate"
region = "us-east-1"
# Read-only access via assume role
role_arn = "arn:aws:iam::${var.platform_account_id}:role/terraform-state-reader"
}
}
locals {
vpc_id = data.terraform_remote_state.platform.outputs.vpc_config.vpc_id
}
Security Considerations
Security must be built into every layer of the automation. Here's a comprehensive security configuration:
# backend.tf - Encrypted state with access logging
terraform {
backend "s3" {
bucket = "company-terraform-state"
key = "workspaces/prod/terraform.tfstate"
region = "us-east-1"
encrypt = true
kms_key_id = "arn:aws:kms:us-east-1:123456789012:key/12345678-1234-1234-1234-123456789012"
dynamodb_table = "terraform-state-lock"
# Enable access logging
access_logging {
target_bucket = "company-terraform-logs"
target_prefix = "state-access/"
}
}
}
# provider.tf - Dynamic credentials with assume role
provider "aws" {
region = var.aws_region
assume_role {
role_arn = "arn:aws:iam::${var.target_account_id}:role/TerraformExecutionRole"
session_name = "terraform-${var.environment}-${timestamp()}"
# Limit session duration
duration_seconds = 3600
# Require MFA for production
dynamic "external_id" {
for_each = var.environment == "prod" ? [1] : []
content {
external_id = var.mfa_token
}
}
}
default_tags {
tags = {
ManagedBy = "Terraform"
Environment = var.environment
LastUpdated = timestamp()
}
}
}
Common Problems and Solutions
Organizations consistently encounter similar challenges. Here are field-tested solutions:
State Lock Conflicts
Implement queuing mechanisms to prevent concurrent modifications:
#!/bin/bash
# terraform-wrapper.sh - Prevents concurrent runs
LOCK_FILE="/tmp/terraform-${WORKSPACE}.lock"
TIMEOUT=300
acquire_lock() {
local count=0
while [ -f "$LOCK_FILE" ]; do
if [ $count -gt $TIMEOUT ]; then
echo "ERROR: Timeout waiting for lock release"
exit 1
fi
echo "Waiting for existing Terraform operation to complete..."
sleep 5
((count+=5))
done
echo $$ > "$LOCK_FILE"
}
release_lock() {
rm -f "$LOCK_FILE"
}
trap release_lock EXIT
acquire_lock
terraform "$@"
Large State Files
Split monolithic configurations into focused modules:
# Avoid: Single state file with 500+ resources
# Better: Split by service boundary
# terraform/networking/main.tf
module "vpc" {
source = "../../modules/vpc"
# VPC-specific configuration
}
# terraform/compute/main.tf
module "eks_cluster" {
source = "../../modules/eks"
vpc_id = data.aws_vpc.main.id
# EKS-specific configuration
}
# terraform/data/main.tf
module "rds_cluster" {
source = "../../modules/rds"
vpc_id = data.aws_vpc.main.id
# RDS-specific configuration
}
Building Your Implementation Roadmap
Successful implementations follow a phased approach:
Phase 1: Foundation (Weeks 1-4)
- Implement remote state with encryption
- Set up basic CI/CD integration
- Create workspace naming conventions
- Deploy initial policy framework
Phase 2: Automation (Weeks 5-8)
- Enable PR automation workflows
- Implement approval processes
- Add security scanning
- Create module templates
Phase 3: Scale (Weeks 9-12)
- Onboard additional teams
- Implement advanced policies
- Add cost management controls
- Enable cross-team dependencies
Phase 4: Optimization (Ongoing)
- Performance tuning
- Advanced automation features
- Integration with enterprise tools
- Continuous improvement
For organizations evaluating platforms, consider these key differentiators:
- Open source (Atlantis): Maximum flexibility, requires significant operational investment
- HashiCorp-native (Terraform Cloud): Deepest Terraform integration, limited to single tool
- Multi-IaC platforms (Spacelift, env0): Flexibility for diverse environments
- Governance-focused (Scalr): Native OPA integration, hierarchical management, built-in FinOps
The choice ultimately depends on your organization's specific needs, existing toolchain, and governance requirements. Platforms emphasizing native policy integration and cost management often provide faster time-to-value for enterprises prioritizing governance and financial control.
Summary
Cross-team Terraform automation has evolved from experimental practice to enterprise necessity. Success requires choosing the right platform, implementing robust governance, and following proven patterns from organizations that have scaled successfully. Whether using open-source Atlantis or commercial platforms, the key lies in starting simple, automating incrementally, and maintaining focus on security and governance throughout the journey.