Troubleshooting “Error acquiring the state lock: ConditionalCheckFailedException” in Terraform

In today’s blog post, we are going to cover how to troubleshoot “Error acquiring the state lock: ConditionalCheckFailedException.” in terraform. A state file is used by Terraform to map real world resources to your configuration, keep track of metadata, and to improve performance for large infrastructures. When terraform is used by multiple users to manage the same infrastructure, a lock must be acquired before any attempt to update the state file. This makes sure your state file and infrastructure is consistence across all execution of terraform plan and terraform apply.

What is the “Error acquiring the state lock: ConditionalCheckFailedException”?

At its core, this error indicates that Terraform attempted to obtain a lock on your state file but failed because another process already holds that lock or the lock could not be acquired under the specified conditions. As stated earlier, terraform uses state locking to prevent multiple concurrent operations from modifying the same state file simultaneously. This mechanism is critical for data consistency and preventing accidental corruption of your infrastructure state, especially in collaborative environments.

The ConditionalCheckFailedException specifically points to a failure in a conditional write operation. This often happens when Terraform tries to write a lock, but the condition for that write (e.g., “lock does not already exist”) is not met.

Common Causes of the Terraform State Lock Error

The “Error acquiring the state lock: ConditionalCheckFailedException” typically arises from situations where a Terraform process was unexpectedly interrupted or terminated. Here are the primary causes:

  • Abrupt Process Termination: The most frequent cause is when a terraform plan or terraform apply process fails or is terminated prematurely. This could be due to:
    • Network Interruption: Your internet connection drops during a Terraform operation.
    • Manual Termination: You or your CI/CD pipeline forcibly stop a running Terraform command (e.g., Ctrl+C).
    • System Crashes: The machine running Terraform experiences an unexpected shutdown or crash.
    • In these scenarios, Terraform might not get a chance to release the state lock it acquired, leaving it in a locked state. Terraform “believes” that the previous process is still actively working on the infrastructure, thus blocking subsequent operations to prevent conflicts.
  • Concurrent Operations (Less Common for this Specific Error): While state locking aims to prevent concurrent operations, if two processes try to acquire the lock at almost the exact same time, one might fail with this error if the locking mechanism’s conditional check fails for the second attempt. However, the primary cause remains an orphaned lock from a failed process.

Solutions to Resolve the “terraform ConditionalCheckFailedException”

When faced with this error, the first step is to assess if any Terraform process is still running that might be holding the lock. If not, you can safely proceed with unlocking the state.

1. Force Unlock the Terraform State

If you are certain no other Terraform process is actively modifying your infrastructure, the terraform force-unlock command can be used to unlock your state file.

  • How it works: This command forces the release of a stale or orphaned lock on your Terraform state.
  • Usage: You will need the numerical ID of the lock. This ID is usually provided in the error message itself when you encounter the ConditionalCheckFailedException.
terraform force-unlock <numerical_lock_id>

Example: If your error message includes Lock ID: 12345678-abcd-efgh-ijkl-9876543210ab, you would run:

terraform force-unlock 12345678-abcd-efgh-ijkl-9876543210ab
  • Important Considerations:
    • Verify no active process: Before using force-unlock, always double-check that no other terraform command is currently running against the same state. Forcing an unlock while another process is active can lead to state corruption. If unsure, it is generally safer to wait for a few minutes and try again.
    • Permissions: Ensure the user or service account executing terraform force-unlock has the necessary permissions to delete lock files in your backend (e.g., s3:DeleteObject for AWS S3, “Storage Object Admin” for GCP).
    • Backup: Make sure to take a backup of your state file.

2. Identify and Terminate Conflicting Processes

If you suspect a process might still be running or if force-unlock does not immediately resolve the issue (perhaps due to a persistent process), you can try to identify and terminate it manually.

For Linux/macOS:

ps aux | grep terraform

This command will list all running processes containing “terraform” in their name. Look for any active terraform plan or terraform apply processes. Once identified, terminate the process using its Process ID (<process_id>):

sudo kill -9 <process_id>

For Windows: Use Task Manager or PowerShell commands like Get-Process and Stop-Process.

3. Manually Break Leases for Remote State Files (e.g., Azure Storage)

If you are using remote state backends like Azure Storage Accounts, the locking mechanism might involve leases on blob storage. In such cases, if the terraform force-unlock command does not work, you might need to manually break the lease.

  • Azure Storage Accounts: You can typically do this through the Azure portal by navigating to the storage account, then to the blob container where your state file is stored. Find the state file blob, and there should be an option to “Break Lease” or similar.

4. Review Backend Configuration and Permissions

Ensure your Terraform backend configuration is correct and that the credentials used have the necessary permissions for state locking and unlocking.

  • AWS S3 Backend: The IAM role or user needs s3:PutObject, s3:GetObject, and crucially, s3:DeleteObject permissions on the S3 bucket where the state file and lock file are stored.
  • Azure Backend: The service principal or user needs appropriate permissions on the Storage Account and Blob Container (e.g., Storage Blob Data Contributor role).
  • GCP Backend: The service account requires roles like “Storage Object Admin” on the bucket.

An incorrect AWS profile terraform state lock error, for example, often points directly to a misconfiguration of credentials or insufficient IAM permissions.

Preventing Future State Lock Issues

While solutions exist, preventing the “Error acquiring the state lock: ConditionalCheckFailedException” is always better:

  • Graceful Exits: Allow Terraform commands to complete naturally. Avoid Ctrl+C unless absolutely necessary.
  • Robust CI/CD Pipelines: Design your CI/CD pipelines to handle Terraform runs gracefully, ensuring that processes complete or fail in a way that releases locks. Implement timeouts and proper error handling.
  • Network Stability: Ensure a stable network connection when running Terraform operations, especially for critical deployments.
  • Monitor Terraform Runs: Keep an eye on your Terraform processes, particularly long-running apply operations.

Conclusion

The “Error acquiring the state lock: ConditionalCheckFailedException” can block your terraform workflow, but it is a solvable issue. By understanding that this error is caused from an orphaned or active state locks, and by applying the command terraform force-unlock, identifying and terminating conflicting processes, and ensuring correct permissions, you can resolve this issue. Always prioritize confirming no active processes before forcing an unlock to prevent state corruption. And make sure you take a backup of your state file just in case something goes wrong.

Author

Debjeet Bhowmik

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins.
In my free time, I write blogs on ckdbtech.com

Leave a Comment