Managing Drift - Part 2

In Part 1 of this series, we explored the pervasive challenge of infrastructure drift—what it is, why it happens, and the inherent limitations of relying solely on native Terraform and OpenTofu commands like plan and refresh for its detection. While these tools are fundamental, managing drift effectively at scale demands a more robust, automated, and insightful approach.

This is where Infrastructure as Code (IaC) management platforms like Scalr step in. Scalr offers an integrated solution designed to provide comprehensive drift detection and, crucially, a user-controlled framework for remediation, with explicit support for both Terraform and OpenTofu environments.

Scalr's Philosophy: Detection with Deliberate Action

Scalr's approach to drift detection is built into its platform, aiming to provide continuous visibility without sacrificing operational control. As a founding member of the OpenTofu initiative, Scalr ensures its features cater seamlessly to this growing open-source ecosystem.

How Scalr Spots the Differences: Detection Methodology

Scalr employs a flexible detection strategy:

  1. Git as a Source of Truth: It can compare the live environment (by effectively running a terraform plan -refresh=true or tofu plan -refresh=true equivalent) against the code committed in your Git repository. This is the classic IaC desired state.
  2. Last Known Applied State: Scalr can also identify discrepancies by comparing the actual infrastructure state against the "last known applied state" within Scalr. This is particularly valuable for catching drift that might have occurred if changes were successfully deployed via Scalr but not immediately committed back to Git, or if you need to verify against the last configuration that was confirmed operational through the platform.

This dual-source comparison offers a more comprehensive net for catching deviations. Importantly, Scalr's drift detection runs themselves do not count against your billable run quota. Only the actions you take to remediate drift are considered billable.

Automated Vigilance: Scheduling and Workflow Integration

Manual, ad-hoc drift checks are prone to being forgotten. Scalr automates this process:

  • Scheduled Checks: You can configure drift detection to run automatically at set intervals (e.g., daily, weekly) at the "Environment" level. This schedule then applies to all workspaces within that environment.
  • Intelligent Execution: For a scheduled scan to run on a workspace, certain conditions apply: the workspace must have an active state, it shouldn't have had a recent deployment (configurable period), and remote execution must be enabled. This prevents redundant checks on actively changing environments.

Bringing Drift into the Light: Reporting, Alerts, and Dashboards

Detection is only half the battle; visibility is key. Scalr provides multiple ways to understand and track drift:

  • Dedicated "Drift Detection" Tab: All runs that identify discrepancies are clearly listed here, providing a centralized place to review and manage detected changes.
  • Notifications: Get alerted when drift is found. Scalr integrates with Slack, with MS Teams support planned, ensuring the right teams are notified promptly.
  • Custom Drift Dashboards: Build dashboards to get a high-level overview of drift status across all workspaces in your organization, helping to spot trends and prioritize action.
  • Reports: Generate drift reports at the account or environment level for broader analysis and communication.

The Scalr Difference: User-Controlled Remediation

This is a cornerstone of Scalr's drift management philosophy. Unlike some tools that offer fully automated "fixes," Scalr requires explicit user intervention to address detected drift. This deliberate approach ensures that no changes are made to your infrastructure without review and consent, aligning with best practices for operational safety and change management.

When Scalr detects drift, it presents you with three clear options:

  1. Ignore: Acknowledge the drift but take no action within Scalr. This is suitable if the drift is intentional, expected, or will be handled manually outside Scalr.
  2. Sync State (Refresh-Only Run): This updates Scalr's stored Terraform/OpenTofu state file to match the actual (drifted) infrastructure. It's akin to running terraform apply -refresh-only or tofu apply -refresh-only. This is a billable run.
  3. Revert Infrastructure (Plan & Apply Run): If the drift is undesired, Scalr will generate a plan to revert the infrastructure to its previously defined state (as per your code or last applied state). Upon your approval, Scalr applies this plan. This is also a billable run.

Conceptual Interaction for Remediation:

Imagine Scalr's UI presenting something like this after detecting a security group change:

Scalr Drift Detection Report:
Workspace: 'production-web-servers'
Drift Detected:
  ~ aws_security_group.app_sg:
    ingress:
      - (known) cidr_blocks: ["10.1.0.0/16"], from_port: 443, to_port: 443, protocol: "tcp"
      + (drifted) cidr_blocks: ["0.0.0.0/0"], from_port: 443, to_port: 443, protocol: "tcp" # Unintended wide-open access

Available Actions:
1. Ignore Drift: [Acknowledge and do nothing in Scalr]
2. Sync State: [Run 'tofu apply -refresh-only' to update state file with 0.0.0.0/0]
3. Revert Infrastructure: [Plan and apply changes to revert to 10.1.0.0/16]

Choose action (1-3):

(This is a conceptual representation. Actual interaction occurs through Scalr's web UI.)

This user-centric control is particularly appealing to organizations with stringent change management policies or those operating in highly regulated industries.

Seamless OpenTofu Support

Scalr's commitment to OpenTofu is robust. All its drift detection and management features are fully applicable to OpenTofu environments. This gives teams the confidence to adopt or migrate to OpenTofu while leveraging Scalr's comprehensive platform capabilities.

Strengths of Scalr's Approach:

  • Integrated Platform: Drift detection is native, not an add-on.
  • Scheduled Automation: Ensures regular, consistent monitoring.
  • Clear Visibility: Dedicated UI, dashboards, and alerts keep teams informed.
  • User-Controlled Remediation: Prioritizes safety and deliberate action.
  • Explicit OpenTofu Support: Future-proofs your IaC strategy.
  • Flexible Detection Sources: Checks against Git and last applied state.

Potential Considerations:

  • No Fully Automated Fixes: While a strength for many, organizations seeking completely hands-off remediation will find this a deliberate design choice that differs from their needs.
  • Billable Remediation: Corrective actions (Sync State, Revert) are billable runs, a factor for cost planning in high-drift environments.
  • Unmanaged Resource Focus: While Scalr excels at managed resource drift, its specific capabilities for detecting resources created entirely outside of IaC (true "shadow IT") are not as prominently detailed as in some specialized tools.

Scalr offers a compelling solution for organizations that want to take drift seriously, providing the tools for robust detection and the framework for safe, controlled remediation. Its approach empowers teams to maintain infrastructure integrity without relinquishing critical oversight.

In the final part of our series, we'll broaden our view to the wider ecosystem of drift detection tools, comparing Scalr's approach to various alternatives, and offer guidance on choosing the right strategy for your organization.