Please Test Your IaC Code
I'd begin with a story.
Recently, we migrated parts of our cloud workloads from Google Cloud Platform to another provider.
This meant we had to re-provision compute instances (VMs) that matches the original Google Compute Engine (GCE) resources' allocated CPU and memory.
In the Python script used, there is a simple dictionary translating the GCE machine type to its equivalent CPU and memory. As an example:
mapping = {
"n1-standard-2": {
"cpu": 2,
"memGB": 7.5
},
...
}
The subsequent code to provision the VMs did this:
...
req = mapping[machine_type]
cpu = int(req["cpu"])
memory = int(req["memGB"])
provision_vm(cpu, memory)
...
Have you noticed something amiss here?
Let's take a minute or so.
You see, the VM ended up under-provisioned (7 GB).
int(7.5) == 7
Thankfully, we caught this issue early within our own test environment.
Ultimately, scripts, Kubernetes manifests, Terraform code are still software code. Here is a gentle reminder that we can and should test or validate the soundness of our IaC code.