Terraform vs. CloudFormation: The Definitive Guide
If, like me, you scoured the internet to help you choose between CloudFormation and Terraform as your next infrastructure-as-code (IaC) tool without finding a definitive answer, I shared your pain for a long while. Now, I have significant experience with both tools and I can make an informed decision about which one to use.
For your IaC project on AWS, choose CloudFormation, because:
- CloudFormation makes a distinction between code (i.e., templates) and instantiations of the code (i.e., stacks). In Terraform, there is no such distinction. More on this in the next section.
- Terraform doesn't handle basic dependency management very well. More on that in a later section.
Differentiating Between Code and Instantiations
One difference between CloudFormation and Terraform is how code and instantiations relate to each other within each service.
CloudFormation has the concept of a stack, which is the instantiation of a template. The same template can be instantiated ad infinitum by a given client in a given account, across accounts, or by different clients.
Terraform has no such concept and requires a one-to-one relationship between code and its instantiation. It would be similar to duplicating the source code of a web server for each server you want to run, or duplicating the code every time you need to run an application instead of running the compiled version.
This point is quite trivial in the case of a simple setup, but it quickly becomes a major pain point for medium- to large-scale operations. In Terraform, every time you need to spin up a new stack from existing code, you need to duplicate the code. And copy/pasting script files is a very easy way to sabotage yourself and corrupt resources that you didn’t intend to touch.
Terraform actually doesn’t have a concept of stacks like CloudFormation, which clearly shows that Terraform has been built from the ground up to have a one-to-one match between code and the resources it manages. This was later partially rectified by the concept of environments (which have since been renamed “workspaces”), but the way to use those makes it incredibly easy to deploy to an unwanted environment. This is because you have to run
terraform workspace select prior to deploying, and forgetting this step will deploy to the previously selected workspace, which might or might not be the one you want.
In practice, it is true that this problem is mitigated by the use of Terraform modules, but even in the best case, you would require a significant amount of boilerplate code. In fact, this problem was so acute that people needed to create a wrapper tool around Terraform to address this problem: Terragrunt.
State Management and Permissions
Another important difference between CloudFormation and Terraform is how they each manage state and permissions.
CloudFormation manages stack states for you and doesn’t give you any options. But CloudFormation stack states have been solid in my experience. Also, CloudFormation allows less privileged users to manage stacks without having all the necessary permissions required by the stack itself. This is because CloudFormation can get the permissions from a service role attached to the stack rather than the permissions from the user running the stack operation.
Terraform requires you to provide it with some back ends to manage states. The default is a local file, which is totally unsatisfactory, given:
- The robustness of your state file is entirely linked to the robustness of the machine it is stored on.
- That pretty much makes teamwork impossible.
So you need a robust and shared state, which on AWS is usually achieved by using an S3 bucket to store the state file, accompanied by a DynamoDB table to handle concurrency.
This means you need to create an S3 bucket and DynamoDB table manually for each stack you want to instantiate, and also manage permissions manually for these two objects to restrict less privileged users from having access to data they should not have access to. If you have just a couple of stacks, that won’t be too much of a problem, but if you have 20 stacks to manage, that does become very cumbersome.
By the way, when using Terraform workspaces, it is not possible to have one DynamoDB table per workspace. This means that if you want an IAM user with minimal permissions to perform deployments, that user will be able to fiddle with the locks of all workspaces because DynamoDB permissions are not fine-grained to the item level.
On this point, both CloudFormation and Terraform can be a bit tricky. If you change the logical ID (i.e., the name) of a resource, both will consider that the old resource must be destroyed and a new one created. So it’s generally a bad idea to change the logical ID of resources in either tool, especially for nested stacks in CloudFormation.
As mentioned in the first section, Terraform doesn’t handle basic dependencies. Unfortunately, the Terraform developers aren’t giving the long-standing issue much attention, despite the apparent lack of workarounds.
Given that proper dependency management is absolutely critical to an IaC tool, such issues in Terraform call its suitability into question as soon as business-critical operations are involved, such as deploying to a production environment. CloudFormation gives a much more professional feel, and AWS is always very attentive to making sure that it offers production-grade tools to its clients. In all the years I have been using CloudFormation, I’ve never come across a problem with dependency management.
CloudFormation allows a stack to export some of its output variables, which can then be reused by other stacks. To be honest, this functionality is limited, as you won’t be able to instantiate more than one stack per region. This is because you can’t export two variables with the same name, and exported variables don’t have namespaces.
Terraform offers no such facilities, so you are left with less desirable options. Terraform allows you to import the state of another stack, but that gives you access to all the information in that stack, including the many secrets that are stored in the state. Alternatively, a stack can export some variables in the form of a JSON file stored in an S3 bucket, but again, this option is more cumbersome: You have to decide which S3 bucket to use and give it the appropriate permissions, and write all the plumbing code yourself on both the writer and reader sides.
One advantage of Terraform is that it has data sources. Terraform can thus query resources not managed by Terraform. However, in practice, this has little relevance when you want to write a generic template because you then won’t assume anything about the target account. The equivalent in CloudFormation is to add more template parameters, which thus involves repetition and potential for errors; however, in my experience, this has never been a problem.
Back to the issue of Terraform’s dependency management, another example is you get an error when you try to update the settings for a load balancer and get the following:
Error: Error deleting Target Group: ResourceInUse: Target group 'arn:aws:elasticloadbalancing:us-east-1:723207552760:targetgroup/strategy-api-default-us-east-1/14a4277881e84797' is currently in use by a listener or a rule status code: 400, request id: 833d8475-f702-4e01-aa3a-d6fa0a141905
The expected behavior would be that Terraform detects that the target group is a dependency of some other resource that is not being deleted, and consequently, it should not try to delete it—but neither should it throw an error.
Although Terraform is a command-line tool, it is very clear that it expects a human to run it, as it is very much interactive. It is possible to run it in batch mode (i.e., from a script), but this requires some additional command-line arguments. The fact that Terraform has been developed to be run by humans by default is quite puzzling, given that an IaC tool’s purpose is automation.
Terraform is difficult to debug. Error messages are often very basic and don’t allow you to understand what is going wrong, in which case you will have to run Terraform with
TF_LOG=debug, which produces a huge amount of output to trawl through. Complicating this, Terraform sometimes makes API calls to AWS that fail, but the failure is not a problem with Terraform. In contrast, CloudFormation provides reasonably clear error messages with enough details to allow you to understand where the problem is.
An example Terraform error message:
Error: error reading S3 bucket Public Access Block: NoSuchBucket: The specified bucket does not exist status code: 404, request id: 19AAE641F0B4AC7F, host id: rZkgloKqxP2/a2F6BYrrkcJthba/FQM/DaZnj8EQq/5FactUctdREq8L3Xb6DgJmyKcpImipv4s=
The above error message shows a clear error message which actually doesn’t reflect the underlying problem (which in this case was a permissions issue).
This error message also shows how Terraform can sometimes paint itself into a corner. For example, if you create an S3 bucket and an
aws_s3_bucket_public_access_block resource on that bucket, and if for some reason you make some changes in the Terraform code that destroys that bucket—e.g., in the “change implies delete and create” gotcha described above—Terraform will get stuck trying to load the
aws_s3_bucket_public_access_block but continually failing with the above error. The correct behavior from Terraform would be to replace or delete the
aws_s3_bucket_public_access_block as appropriate.
Lastly, you can’t use the CloudFormation helper scripts with Terraform. This might be an annoyance, especially if you’re hoping to use cfn-signal, which tells CloudFormation that an EC2 instance has finished initializing itself and is ready to serve requests.
Syntax, Community, and Rolling Back
Syntax-wise, Terraform does have a good advantage compared to CloudFormation—it supports loops. But in my own experience, this feature can turn out to be a bit dangerous. Typically, a loop would be used to create a number of identical resources; however, when you want to update the stack with a different count, there might be a chance that you might need to link the old and the new resources (for example, using
zipmap() to combine values from two arrays which now happen to be of different sizes because one array has the size of the old loop size and the other has the size of the new loop size). It is true that such a problem can happen without loops, but without loops, the problem would be much more evident to the person writing the script. The use of loops in such a case obfuscates the problem.
Whether Terraform’s syntax or CloudFormation’s syntax is better is mostly a question of preferences. CloudFormation initially supported only JSON, but JSON templates are very hard to read. Fortunately, CloudFormation also supports YAML, which is much easier to read and allows comments. CloudFormation’s syntax tends to be quite verbose, though.
Terraform’s syntax uses HCL, which is a kind of JSON derivative and is quite idiosyncratic. Terraform offers more functions than CloudFormation, and they are usually easier to make sense of. So it could be argued that Terraform does have a slight advantage on this point.
Another advantage of Terraform is its readily available set of community-maintained modules, and this does simplify writing templates. One issue might be that such modules might not be secure enough to comply with an organization’s requirements. So for organizations requiring a high level of security, reviewing these modules (as well as further versions as they come) might be a necessity.
Generally speaking, Terraform modules are much more flexible than CloudFormation nested stacks. A CloudFormation nested stack tends to hide everything below it. From the nesting stack, an update operation would show that the nested stack will be updated but doesn’t show in detail what is going to happen inside the nested stack.
A final point, which could be contentious actually, is that CloudFormation attempts to roll back failed deployments. This is quite an interesting feature but can unfortunately be very long (for example, it could take up to three hours for CloudFormation to decide that a deployment to Elastic Container Service has failed). In contrast, in the case of failure, Terraform just stops wherever it was. Whether a rollback feature is a good thing or not is debatable, but I have come to appreciate the fact that a stack is maintained in a working state as much as possible when a longer wait happens to be an acceptable tradeoff.
In Defense of Terraform vs. CloudFormation
Terraform does have advantages over CloudFormation. The most important one, in my opinion, is that when applying an update, Terraform shows you all the changes you are about to make, including drilling down into all the modules it is using. By contrast, CloudFormation, when using nested stacks, only shows you that the nested stack needs updating, but doesn’t provide a way to drill down into the details. This can be frustrating, as this type of information is quite important to know before hitting the “go” button.
Both CloudFormation and Terraform support extensions. In CloudFormation, it is possible to manage so-called “custom resources” by using an AWS Lambda function of your own creation as a back end. For Terraform, extensions are much easier to write and form part of the code. So there is an advantage for Terraform in this case.
Terraform can handle many cloud vendors. This puts Terraform in a position of being able to unify a given deployment among multiple cloud platforms. For example, say you have a single workload spread between AWS and Google Cloud Platform (GCP). Normally, the AWS part of the workload would be deployed using CloudFormation, and the GCP part using GCP’s Cloud Deployment Manager. With Terraform, you could instead use a single script to deploy and manage both stacks in their respective cloud platforms. In this way, you only have to deploy one stack instead of two.
Non-arguments for Terraform vs. CloudFormation
There are quite a few non-arguments that continue to circulate around the internet. The biggest one thrown around is that because Terraform is multi-cloud, you can use one tool to deploy all your projects, no matter in what cloud platform they are done. Technically, this is true, but it’s not the big advantage it may appear to be, especially when managing typical single-cloud projects. The reality is that there is an almost one-to-one correspondence between resources declared in (for example) CloudFormation and the same resources declared in a Terraform script. Since you have to know the details of cloud-specific resources either way, the difference comes down to syntax, which is hardly the biggest pain point in managing deployments.
Some argue that by using Terraform, one can avoid vendor lock-in. This argument doesn’t hold in the sense that by using Terraform, you are locked in by HashiCorp (the creator of Terraform), just in the same way that by using CloudFormation, you are locked in by AWS, and so on for other cloud platforms.
The fact that Terraform modules are easier to use is to me of lesser importance. First of all, I believe that AWS deliberately wants to avoid hosting a single repository for community-based CloudFormation templates because of the perceived responsibility for user-made security holes and breaches of various compliance programs.
At a more personal level, I fully understand the benefits of using libraries in the case of software development, as those libraries can easily run into the tens of thousands of lines of code. In the case of IaC, however, the size of the code is usually much less, and such modules are usually a few dozen lines long. Using copy/paste is actually not that bad an idea in the sense that it avoids problems with maintaining compatibility and delegating your security to unknown people.
Using copy/paste is frowned upon by many developers and DevOps engineers, and there are good reasons behind this. However, my point of view is that using copy/paste for snippets of code allows you to easily tailor it to your needs, and there is no need to make a library out of it and spend a lot of time to make it generic. The pain of maintaining those snippets of code is usually very low, unless your code becomes duplicated in, say, a dozen or more templates. In such a case, appropriating the code and using it as nested stacks makes sense, and the benefits of not repeating yourself are probably greater than the annoyance of not being able to see what’s going to be updated inside the nested stack when you perform an update operation.
CloudFormation vs. Terraform Conclusion
With CloudFormation, AWS wants to provide its customers with a rock-solid tool which will work as intended at all times. Terraform’s team does too, of course—but it seems that a crucial aspect of their tooling, dependency management, is unfortunately not a priority.
Terraform might have a place in your project, especially if you have a multi-cloud architecture, in which case Terraform scripts are one way to unify the management of resources across the various cloud vendors you are using. But you could still avoid Terraform’s downsides in this case by only using Terraform to manage stacks already implemented using their respective cloud-specific IaC tools.
The overall feeling Terraform vs. CloudFormation is that CloudFormation, although imperfect, is more professional and reliable, and I would definitely recommend it for any project that isn’t specifically multi-cloud.
Understanding the basics
What is CloudFormation?
CloudFormation is the official infrastructure-as-code (IaC) software for Amazon Web Services. CloudFormation automates and orchestrates the creation, update, and deletion of any AWS resources. Also, CloudFormation allows fine-grained permissions and can roll back failed deployments.
What is the use of CloudFormation?
CloudFormation can be used to automate and orchestrate the creation, update, and deletion of AWS resources, based on scripts. This allows documenting the infrastructure, versioning and storage of the infrastructure as a set of text files, reproducibility across accounts and environments, and continuous deployment.
What are the main components of CloudFormation?
The main components of CloudFormation are: CloudFormation templates (declarative scripts defining what the infrastructure should be), stacks (instantiations of CloudFormation templates that can be nested), and StackSets (which allow you to manage CloudFormation stacks across accounts and regions).
What is the difference between Terraform and CloudFormation?
Terraform and CloudFormation are both infrastructure-as-code (IaC) tools. CloudFormation is developed by AWS and only manages AWS resources. Terraform is developed by HashiCorp and can manage resources across a wide range of cloud vendors.
Why is CloudFormation better than Terraform?
CloudFormation is better than Terraform for production workloads that are limited to AWS. The main reason is that in certain circumstances, Terraform doesn’t handle dependencies properly, and this rules it out as production-ready infrastructure-as-code (IaC) software.
What is Terraform used for?
Terraform is an infrastructure-as-code (IaC) tool, and as such is used to automate the creation and management of cloud resources. A distinctive feature of Terraform is that it supports a wide range of cloud platforms.
Who created Terraform?
Terraform is infrastructure-as-code (IaC) software created by HashiCorp. It is written in Golang and was first released in 2014. Terraform is free and open-source software.
Is Terraform a DevOps tool?
Yes, Terraform is a DevOps tool. It is infrastructure-as-code (IaC) software, and such software is a critical part of strategies to automate deployments.
Located in London, United Kingdom
Member since September 6, 2017
About the author
Fabrice is an AWS-certified cloud architect & developer with 20+ years of experience with the likes of Topps, Cisco, Samsung, and Alcatel.