How to Build an Infrastructure with Terraform?

Infrastructure as Code (IaC) is a software development approach that enables infrastructure resource deployment and life cycle management. These resources can be virtual machines (VMs) or managed services (App Service for compute, Azure SQL Database for data, or Azure Service Bus for a messaging service).
This method is needed to get the most out of the Cloud, its flexibility, and its many uses:
- Manage all environments, from development to production
- Provide temporary environments that last only as long as they are used
- Scale your infrastructure, which is no longer set in stone when a project is set up, but changes over time, whether in terms of size or services
Terraform is now an essential IaC tool because of its declarative approach, idempotency management, and its providers, which enable it to work with a wide range of cloud providers and services.
In this post, we’ll look at how software engineering principles can and should be used to get the most out of Terraform.
The Craft Applied to Terraform
Terraform is an Infra as Code tool. So, using all the best code management practices we know makes sense.
As stated in the first Craft Month post, this culture should be applied at all levels, including the cloud and the infrastructure.
Readability
Readability is a key element of the Craft method. In an IaC approach, it is important to make sure that:
- The infrastructure can be scaled quickly
- You do not create a bottleneck where only one person can understand and maintain a Terraform template
A Standardized Template
Using a standard template gives you a common, shared, and understood structure. It makes it easy to learn and allows for quick bootstrapping.
Here’s how the classic Terraform template is set up:
- main.tf contains the resources, datasources, locals, and module calls.
- variables.tf is where all the variables are declared. These variables must be documented (type and description).
- versions.tf contains information about the Terraform versions and the various providers used.
- providers.tf contains the exact configuration for each provider. This is specific to the provider type and will depend on the provider you use (for azurerm, for example, you will find the features to be enabled or the enabling of automatic provider registration).
In addition to this standard structure, other files will be added for complex infrastructures so we don’t end up with a main.tf file with 3000 lines, which is hard to read and maintain. We could focus on structuring by service type (compute, data, etc.).
Resource Naming
This does not mean the name of the deployed resources, but rather their internal name within the Terraform template. For example, an Azure Resource Group can be declared as follows:
resource "azurerm_resource_group" "rg" { … }
The Resource Group’s internal name is rg here, and this is how it can be referenced in the declaration of the resources it contains. From the start of the project, it is important to use names that make it clear how the resources will be used.
Say the infrastructure grows, and there are now four resource groups. These are named rg_1, rg_2, rg_3, and rg_5. Why do we need them? Where is rg_4? Why is it missing? Can I use this name for the Resource Group I need to create? These are all questions to be avoided.
Robustness
Code robustness is another important part of Craft culture. This ensures that the infrastructure will work as expected when deployed. It is achieved by implementing a testing strategy.
Infra as Code shares the same principle as application development: the later a bug is found, the more expensive it is to fix.
This testing should not replace hardening and monitoring of the cloud environments. Instead, they should be used alongside each other. Testing will enable you to find and fix configuration problems as soon as possible.
Infrastructure Testing
It aims to confirm the following:
- The technical completion of a deployment. It can find errors in the template, such as dependencies that must be made explicit or a timer that needs to be set between two resources.
- Variabilization and interpolation. Are the resource names correct? Have the for_each or count instructions been implemented correctly?
These tests will require a dedicated environment in which an infrastructure can be repeatedly deployed, tested, and deleted.
Various tools can be used to implement this:
- Pester: a general testing framework in PowerShell. The workflow will then be deploy (terraform apply) -> test (Invoke-Pester), delete (terraform destroy).
- Terratest or Kitchen and its Terraform plugin, which handles the infrastructure creation and deletion.
Ideally, these tests should be run with each Terraform template increment. Depending on the resources, deployment can take a long time (for example, an instance of Azure API Management or an isolated app service). In this case, running these tests in Nightly Build with an appropriate frequency may be better.
Static Code Analysis
Before deploying any resources, you can use static analysis tools to look for possible configuration or syntax problems.
Terraform has two native commands that do this:
- Terraform fmt makes sure the template is in canonical format
- Terraform validate checks the syntax and configuration
Other tools, like Checkov or terrascan, can be used to check the configuration and confirm, for example, that:
- The tags are applied to the resources correctly
- The Internet exposure is configured to comply with the security rules
- The use of certain third parties is prohibited on cost grounds
Maintainability
A Terraform project must follow a set of software engineering to be maintainable throughout its life cycle (a project does not end after the initial deployment).
Use a Source Code Manager
This seems like an obvious point when it comes to application development. However, when it comes to Terraform projects, this is not always the case. It depends on the maturity of the organizations.
It is essential to use a source code manager and clearly define its framework for use (branches, flow type used, etc.).
These processes must be documented and regularly reviewed.
Use a Remote Backend
For Terraform to work, it needs a state file: tfstate. Without getting into details about its content, this file is just as important as the template itself. A deployment cannot happen without it.
This file contains sensitive elements, so it should not be in a source manager.
Terraform lets several backend types store it, including:
- A local backend. The state file is stored locally alongside the template. This type of backend should only be used for highly specific use cases (validating a feature or testing locally) with extremely limited It is not suitable for industrialized scenarios.
- An azurerm backend (this is a logical choice for Azure Cloud projects). This type of backend is relevant because it makes it possible to both secure this file—using the role-based access control (RBAC) model on the underlying storage account—and facilitate collaborative working (the blob storage can be used by each team member or a Core Service when working with pipelines).
Pipelines
Continuous integration has many benefits. All of the above tests should be systematically integrated and run as often and as soon as possible.
Automated deployment is also important up to release. A key factor in the success of a Terraform project is being able to deploy in a predictable way from a known deployment agent using one or more dedicated identities.
The Pros and Cons of Craft
Adopting a Craft culture for Terraform projects is helpful, but it comes with the same risks as using a Craft approach for an application project.
Premature Optimization
Premature optimization is when supposedly optimized code is implemented before the actual requirement has been confirmed.
This can lead to the misuse of modules in a Terraform project. Modules are an excellent way to apply the Don’t Repeat Yourself (DRY) principle. Implementing them prematurely can be counterproductive. They can be hard to maintain and scale.
Accidental Complexity
Accidental complexity in a Terraform project is caused by an implementation that includes features or use cases that are not necessary. This makes it hard to maintain or change a template.
An example of this would be trying to manage multiple regions for geo-redundancy from the start when the requested infrastructure is single region. This will cause loops and unnecessary interpolation, making the code hard to understand and maintain.
To Craft or Not?
In conclusion, yes, craft your infrastructures! The return on investment of a Craft approach in Terraform projects (or other IaC tools) is extremely important.
In fact, it is mandatory to reap the full benefits of the cloud.
The success of this approach will also depend on how well the organization as a whole can move toward this model in a logical and step-by-step way by following these principles:
- Create a roadmap
- Take baby steps that add value quickly and visibly
- Experiment with different tools, methods, frameworks, etc.
- Learn from both successful and failed projects
- Document, distribute, and industrialize
Would you like to know more about Craftsmanship? See all the posts in our Craft Month series:
- Is the Craft Still Relevant?
- How to Choose the Best Software Architecture with Architectural Drivers?
- Craft and PowerShell: Why Software Engineering Practices Need to Be Applied to Infrastructure
- PySpark Unit Test Best Practices
- Telemetry: Ensuring Code That Works
- How to Boost Your Apps’ Performance with Asyncio: A Practical Guide for Python Developers