Terraform Data Sources
Terraform Data Sources allow your configurations to fetch external information, making infrastructure management more dynamic and resilient. Declared with the data
keyword, they retrieve details from cloud APIs, other Terraform states, local files, or HTTP endpoints, connecting your configuration to external information.
The "Read-Only" Principle & Key Benefits
Data sources are strictly read-only; they fetch information without modifying external objects, preventing errors if data is missing. Key benefits include:
- Dynamic Configurations: Adapt to changing information (e.g., latest AMI IDs) without hardcoding.
- Modularity: Modules become more self-sufficient by discovering environmental data.
- External Data Integration: Standardized way to use data from various systems.
- Error Prevention: Validate external data existence during
terraform plan
.
Data Sources vs. Managed Resources
Resource
blocks define infrastructure Terraform manages (CRUD operations), while data
blocks provide read-only information to configure those resources. An object should be managed by a resource
or referenced by a data
block, not both in the same configuration. The terraform_data
resource is an exception, storing arbitrary values in the state, not for querying external info.
How Data Sources Function: Syntax & Lifecycle
data "<PROVIDER_NAME>_<DATA_SOURCE_TYPE>" "<LOCAL_NAME>" {
# Configuration arguments (filters/identifiers)
[argument_name = expression]
...
}
<PROVIDER_NAME>_<DATA_SOURCE_TYPE>
: Specifies the data source type (e.g.,aws_ami
).<LOCAL_NAME>
: A local reference name (e.g.,data.aws_ami.latest_ubuntu.id
).- Configuration Block (
{...}
): Arguments for the provider to fetch specific data.
Providers use these arguments to query external systems. Data sources are typically evaluated during terraform plan
refresh. Dependencies are inferred, but depends_on
allows explicit ordering.
Practical Applications and Examples
1. Fetching Existing Infrastructure Details
Example: Get existing AWS VPC details
data "aws_vpc" "selected_vpc" {
id = var.target_vpc_id // Assumes var.target_vpc_id is defined
}
resource "aws_subnet" "new_subnet" {
vpc_id = data.aws_vpc.selected_vpc.id
# ...
}
2. Dynamic Configuration Inputs
Example: Use the latest Amazon Linux 2 AMI
data "aws_ami" "latest_amazon_linux" {
most_recent = true
owners = ["amazon"]
filter { name = "name"; values = ["amzn2-ami-hvm-*-x86_64-gp2"] }
}
resource "aws_instance" "app_server" {
ami = data.aws_ami.latest_amazon_linux.id
# ...
}
3. Cross-Configuration Data Sharing
Use terraform_remote_state
to access outputs from another state. Example: terraform_remote_state
(HCP Terraform)
data "terraform_remote_state" "network" {
backend = "remote"
config = { organization = "my-org"; workspaces = { name = "prod-network" } }
}
# Use data.terraform_remote_state.network.outputs.some_output
4. Local-Only Data Sources
Example: http
(fetch public IP)
data "http" "my_public_ip" { url = "https://api.ipify.org?format=json" }
# Use jsondecode(data.http.my_public_ip.response_body).ip
Example: local_file
(read SSH key)
data "local_file" "ssh_key" { filename = "~/.ssh/id_ed25519.pub" }
# Use data.local_file.ssh_key.content
Quick Reference Table: Common Data Sources
Data Source Type | Provider(s) | Common Use Case |
---|---|---|
| AWS | Find latest/specific AMI ID. |
| AWS | Get existing VPC details. |
| (Terraform Core) | Access outputs from another Terraform state. |
|
| Fetch data from an HTTP endpoint. |
|
| Read content from a local file. |
Arguments, Attributes, and Filtering
Data sources use arguments for querying and export attributes with results. Meta-arguments like provider
, depends_on
, count
, and for_each
are common. lifecycle
is mainly for precondition
and postcondition
. Provider-specific arguments act as filters (e.g., filter
blocks in AWS, usually ANDed across blocks, ORed for multiple values in one filter).
Error Handling and Validation
ignore_errors
: Rare, provider-specific, use cautiously as it can mask issues.- Custom Conditions (
precondition
/postcondition
): Preferred method.precondition
validates inputs before reading;postcondition
validates returned data, halting with a custom error if a condition fails.
Example: postcondition
for aws_ami
data "aws_ami" "validated_app_ami" {
# ... arguments ...
lifecycle {
postcondition {
condition = self.tags["Validated"] == "true"
error_message = "AMI must have 'Validated:true' tag."
}
}
}
Security Considerations
terraform_remote_state
: Can expose sensitive data from the entire state file.tfe_outputs
(HCP Terraform/Enterprise): More secure, retrieves only defined outputs.- Best Practices: Avoid exposing sensitive data in outputs; secure underlying systems; manage credentials securely (env vars, IAM roles, Vault); encrypt state files.
Advanced Topics and Best Practices
- Dynamic Blocks: Construct repeatable nested blocks (like
filter
) programmatically. - Performance: Each read is an API call, impacting plan times. Minimize lookups, use specific filters, and centralize common lookups. Data sources refresh on each plan.
- Scaling in Enterprises: Managing data sources at scale involves ensuring consistency, security, performance, and governance. Platforms like Scalr can help by offering centralized variable/output management, RBAC, policy enforcement (OPA), and optimized execution environments, enhancing an organization's ability to use data sources safely and efficiently.
Conclusion and Recommendations
Terraform data sources are vital for dynamic IaC.
- Choose Appropriately: Understand data source types and filtering.
- Secure Sharing: Prefer
tfe_outputs
overterraform_remote_state
in HCP Terraform/Enterprise. - Validate Robustly: Use
precondition
andpostcondition
. - Manage Performance: Minimize lookups and monitor API usage.
- Consult Documentation: Stay updated on provider specifics.
- Consider Management Platforms for Scale: Evaluate IaC platforms for enterprise-wide consistency, security, and governance.
Following these recommendations helps leverage data sources for sophisticated, secure, and maintainable infrastructure automation.