Terraform best practices for Azure

Terraform best practices to help you create more maintainable and secure Terraform configurations.

Terraform best practices for Azure

Terraform is a powerful tool and because of that, it's best to treat your Terraform configuration files with the same rigor as your application codebase.

I've led several DevOps initiatives on projects, helping deploy and configure them for production environments. Before Terraform I worked extensively with Azure Resource Manager (ARM) templates, but over the last year, I've been working solely with Terraform for infrastructure management.

These are some Terraform best practices that I've discovered and applied over the last year. Some are Azure specific and others apply more generally.

Terraform folder structure

To keep our configuration files organized we can use a folder structure as shown below.

The folder structure key points are:

  • main.tf file - entry point and coordinator of all our infrastructure logic.
  • variables.tf file - holds all the variables we'll need for all our infrastructure.
  • variables folder - holds the independent variable files that will apply for a specific deployment eg. dev.tfvars would be used when deploying our development environment. The type and number of variable files will correspond to the environments you deploy. A typical set of files would be dev.tfvars, uat.tfvars, prod.tfvars for a three-stage environment.
  • modules folder - holds the custom Terraform modules you create for re-useable infrastructure configuration.
Terraform folder structure

Storing Terraform state in cloud storage

Terraform uses a Backend to store state and operations. Where the state is used to keep track of the resources that Terraform manages and operations are used for issuing CRUD commands to resources.

When you're working in a team it's important that everyone can access a shared state and that the state can be locked and kept consistent. In the case of Azure, you'd use Azure Blob Storage to store your state.

We can define the backend to be used by Terraform using the configuration section:

terraform {
  required_providers {
    azurerm = {
      source  = "hashicorp/azurerm"
      version = "=2.26"
    }
  }
  # configure azure blob storage as state backend
  backend "azurerm" {
  }
}
Defining Azure Blob Storage as our Terraform backend provider

We don't define the actual configuration settings of the Azure Blob storage inside the main.tf file as this would constrain us to use the same Blob for all our environment state files. Instead, we can define a dev.backend.tfvars file with our configuration:

resource_group_name = "<resource group container Azure Blob storage account>"
storage_account_name = "<Azure Blob storage account>"
container_name = "<Azure Blob storage container name>"
key = "<Azure blob storage container key>"
Defining the Azure Blob Storage settings for our Terraform configuration

We can then apply this as our backend when we initialize Terraform using the command terraform init -backend-config="./variables/dev.backend.tfvars".

Terraform variables for parameterizing configuration files

We can parameterize our configuration files by using variables. This allows us to easily re-use the configuration across multiple environments by simply changing the variables file we reference. Eg. To deploy to our dev environment we simply use our dev.tfvars when running the Terraform apply command as terraform apply -var-file="./variables/dev.tfvars".

In our variables.tf file we define each Terraform variable  that we will pass in with our environment-specific variables. We can set the following properties:

# Main section
## Sub section
variable "main_section__sub_section__resource_name" {
  # documentation for this variable	
  description = "This will be set as the name of the resource"
  
  # type constraint to prevent invalid variable values eg. string, etc
  type = string
  
  # default value, making this variable optional
  default = "sample"
  
  # validation for the variable value
  validation {
    condition     = length(var.main_section__sub_section__resource_name) > 10
    error_message = "The name cannot be longer than 10 characters"
  }
  
  # control variable value visibility in plan/apply stages
  sensitive = true
}

The three key aspects of the above would be:

  • Developing a consistent naming standard for your variables. This way you'll be able to easily reference variables and properly structure them.
  • Writing a clear description for other developers to easily collaborate.
  • Specifying a type so that the basic validation of the supplied value can be achieved.

The validation property is more advanced but especially good when you have things like naming restrictions for Azure resources. For example, storage accounts must have names with a length between 3 and 60 characters with no special characters, and you can then enforce that constraint with a validation.

Terraform modules for re-useable code

Terraform modules allow you to systematically group and reuse resource configurations. Every Terraform configuration has at least one module, the root module which can then call any number of child modules.

Defining a custom Terraform module

By creating these custom child modules you keep the root module clean and allow each child module to focus on a single responsibility. A simple example would be an architecture where you need to deploy an Azure App Service, you can package up this logic into a separate module. It's your choice how granular you want to make your child modules, I would recommend a module per resource and then composing those into subsystem modules so you have a clean configuration structure.

Least privilege access for the service principal associated with Terraform

One of the advantages of Infrastructure as Code is that it limits the scope of production permissions you need to give developers as the infrastructure tool now has responsibility for creating and configuring it now.

When we assign a service principal to Terraform for our deployment we need to ensure that we limit its permissions so that if it goes rogue we limit the damage it can cause. A simple way we can limit this is to specify that the service principal can only make changes within a specific resource group this ensures it cannot accidentally affect any other existing systems. We can then further refine the permissions by specifying what Azure resources it is applied to deploy and to what regions those deployments can be made. All of this ensures that we limit the "blast radius" for misconfigured Terraform files or leaked service principal credentials.

Javaad Patel

FullStack Developer

I'm passionate about building great SaaS platform experiences. Currently learning and writing about cloud architectures, distributed systems and devOps.