Skip to content

Commit

Permalink
terraform: move caddy state to persistent
Browse files Browse the repository at this point in the history
Move caddy state disk to persistent. Binary-cache vm stores let's
encrypt certificates and data on the caddy state disk. This state disk
needs to be stored in 'persistent' data, otherwise there will be issues
with certificate authority rate limits when development environments are
deployed and consequently destroyed.

Signed-off-by: Henri Rosten <[email protected]>
  • Loading branch information
henrirosten committed Feb 9, 2024
1 parent 43ebf4a commit 211229f
Show file tree
Hide file tree
Showing 6 changed files with 141 additions and 19 deletions.
39 changes: 35 additions & 4 deletions terraform/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,8 +54,9 @@ terraform
│   ├── binary-cache-sigkey
│   ├── binary-cache-storage
│   ├── builder-ssh-key
│   └── workspace-specific
├── playground
│   ── terraform-playground.sh
│   ── terraform-playground.sh
├── state-storage
│   └── tfstate-storage.tf
├── modules
Expand All @@ -68,7 +69,7 @@ terraform
```
- The `terraform` directory contains the root terraform deployment files with the VM configurations `binary-cache.tf`, `builder.tf`, and `jenkins-controller.tf` matching the components described in [README-azure.md](./README-azure.md) in its [components section](./README-azure.md#components).
- The `terraform/azarm` directory contains the terraform configuration for Azure `aarch64` builder which is used from ghaf github-actions [build.yml workflow](https://github.com/tiiuae/ghaf/blob/e81ccfb41d75eda0488b6b4325aeccb8385ce960/.github/workflows/build.yml#L151) to build `aarch64` targets for authorized PRs pre-merge. `azarm` is disconnected from the root terraform module: it's a separate configuration with its own state.
- The `terraform/persistent` directory contains the terraform configuration for parts of the infrastructure that are shared between the ghaf-infra development instances. An example of such persistent ghaf-infra resource is the binary cache storage as well as the binary cache signing key. There may be many 'persistent' infrastructure instances - currently `dev` and `prod` deployments have their own instances of the persistent resources. Section [Multiple Environments with Terraform Workspaces](./README.md#multiple-environments-with-terraform-workspaces) discusses this topic with more details.
- The `terraform/persistent` directory contains the terraform configuration for parts of the infrastructure that are considered persitent - resources defined under `terraform/persistent` will not be removed even if the ghaf-infra instance is otherwise removed. An example of such persistent ghaf-infra resource is the binary cache storage as well as the binary cache signing key. There may be many 'persistent' infrastructure instances - currently `dev` and `prod` deployments have their own instances of the persistent resources. Section [Multiple Environments with Terraform Workspaces](./README.md#multiple-environments-with-terraform-workspaces) discusses this topic with more details.
- The `terraform/playground` directory contains tooling to facilitate the usage of terraform workspaces in setting-up distinct copies of the ghaf-infra infrastructure, i.e. 'playground' `dev` environments. It also includes an [example test infrastructure](./playground/test-infra.tf) that allows deploying example infrastructure including just one nix VM, highlighting the use of `terraform/modules` to build and upload the nix image on Azure.
- The `terraform/state-storage` directory contains the terraform configuration for the ghaf-infra remote backend state storage using Azure storage blob. See section [Initializing Azure State and Persistent Data](./README.md#initializing-azure-state-and-persistent-data) for more details.
- The `terraform/modules` directory contains terraform modules used from the ghaf-infra VM configurations to build, upload, and spin up Azure nix images.
Expand All @@ -95,7 +96,7 @@ In addition to the shared terraform state, some of the infrastructure resources
To support infrastructure development in isolated environments, this project uses [terraform workspaces](https://developer.hashicorp.com/terraform/cli/workspaces).
The main reasons for using terraform workspaces include:
- Different workspaces allow deploying different instances of ghaf-infra. Each instance has a completely separate state data, making it possible to deploy `dev`, `prod`, or even private development instances of ghaf-infra. This makes it possible to first develop and test infrastructure changes in a private development environment, before proposing changes to shared (e.g. `dev` or `prod`) environments. The configuration codebase is the same between all the environments, with the differentiation options defined in the [`main.tf`](./main.tf#L69).
- Parts of the ghaf-infra infrastructure are persistent and shared between different environments. As an example, private `dev` environments share the binary cache storage. This arrangement makes it possible to treat, for instance, `dev` and private ghaf-infra instances dispensable: ghaf-infra instances can be temporary and short-lived as it's easy to spin-up new environments without losing any valuable data. The persistent data is configured outside the root ghaf-infra terraform deployment in the `terraform/persistent` directory. There may be many 'persistent' infrastructure instances - currently `dev` and `prod` deployments have their own instances of the persistent resources. This means that `dev` and `prod` instances of ghaf-infra do **not** share any persistent data. As an example, `dev` and `prod` deployments of ghaf-infra have a separate binary cache storage. The binding to persistent resources from ghaf-infra is done in the [`main.tf`](./main.tf#L166) based on the terraform workspace name and resource location.
- Parts of the ghaf-infra infrastructure are persistent and shared between different environments. As an example, private `dev` environments share the binary cache storage. This arrangement makes it possible to treat, for instance, `dev` and private ghaf-infra instances dispensable: ghaf-infra instances can be temporary and short-lived as it's easy to spin-up new environments without losing any valuable data. The persistent data is configured outside the root ghaf-infra terraform deployment in the `terraform/persistent` directory. There may be many 'persistent' infrastructure instances - currently `dev` and `prod` deployments have their own instances of the persistent resources. This means that `dev` and `prod` instances of ghaf-infra do **not** share any persistent data. As an example, `dev` and `prod` deployments of ghaf-infra have a separate binary cache storage. The binding to persistent resources from ghaf-infra is done in the [`main.tf`](./main.tf#L166) based on the terraform workspace name and resource location. Persistent data initialization is automatically done with `terraform-init.sh` script.
- Currently, the following resources are defined 'persistent', meaning `dev` and `prod` instances do not share the following resources:
- Binary cache storage: [`binary-cache-storage.tf`](./persistent/binary-cache-storage/binary-cache-storage.tf)
- Binray cache signing key: [`binary-cache-sigkey.ft`](./persistent/binary-cache-sigkey/binary-cache-sigkey.tf)
Expand Down Expand Up @@ -177,4 +178,34 @@ Example fix:
$ terraform import azurerm_virtual_machine_extension.deploy_ubuntu_builder /subscriptions/<SUBID>/resourceGroups/rg-name-here/providers/Microsoft.Compute/virtualMachines/azarm/extensions/azarm-vmext

# Ref: https://stackoverflow.com/questions/61418168/terraform-resource-with-the-id-already-exists
```
```

#### Error: creating/updating Image
```bash
$ terraform apply
...
│ Error: creating/updating Image (Subscription: "<SUBID>"
│ Resource Group Name: "ghaf-infra-dev"
│ Image Name: "<NAME>"): performing CreateOrUpdate: unexpected status 400 with error: InvalidParameter: The source blob https://<INSTANCE>.blob.core.windows.net/ghaf-infra-vm-images/<IMANE>.vhd is not accessible.
│ with module.builder_image.azurerm_image.default,
│ on modules/azurerm-nix-vm-image/main.tf line 22, in resource "azurerm_image" "default":
│ 22: resource "azurerm_image" "default" {
```
Try running `terraform apply` again if you get an error similar to one shown above.
It's unclear why this error occasionally occurs, this issue should be analyzed in detail.
#### Error: Disk
```bash
$ terraform apply
...
│ Error: Disk (Subscription: "<SUBID>"
│ Resource Group Name: "ghaf-infra-persistent-eun"
│ Disk Name: "binary-cache-vm-caddy-state-dev") was not found
│ with data.azurerm_managed_disk.binary_cache_caddy_state,
│ on main.tf line 207, in data "azurerm_managed_disk" "binary_cache_caddy_state":
│ 207: data "azurerm_managed_disk" "binary_cache_caddy_state" {
```
Above error (or similar) is likely caused by missing initialization for some `persistent` resources.
Fix the persistent initialization by running `terraform-init.sh` then run `terraform apply` again.
16 changes: 3 additions & 13 deletions terraform/binary-cache.tf
Original file line number Diff line number Diff line change
Expand Up @@ -50,12 +50,12 @@ module "binary_cache_vm" {

# Attach disk to the VM
data_disks = [{
name = azurerm_managed_disk.binary_cache_caddy_state.name
managed_disk_id = azurerm_managed_disk.binary_cache_caddy_state.id
name = data.azurerm_managed_disk.binary_cache_caddy_state.name
managed_disk_id = data.azurerm_managed_disk.binary_cache_caddy_state.id
lun = "10"
create_option = "Attach"
caching = "None"
disk_size_gb = azurerm_managed_disk.binary_cache_caddy_state.disk_size_gb
disk_size_gb = data.azurerm_managed_disk.binary_cache_caddy_state.disk_size_gb
}]
}

Expand Down Expand Up @@ -96,13 +96,3 @@ resource "azurerm_role_assignment" "binary_cache_access_storage" {
role_definition_name = "Storage Blob Data Reader"
principal_id = module.binary_cache_vm.virtual_machine_identity_principal_id
}

# Create a data disk
resource "azurerm_managed_disk" "binary_cache_caddy_state" {
name = "binary-cache-vm-caddy-state"
resource_group_name = azurerm_resource_group.infra.name
location = azurerm_resource_group.infra.location
storage_account_type = "Standard_LRS"
create_option = "Empty"
disk_size_gb = 1
}
6 changes: 6 additions & 0 deletions terraform/main.tf
Original file line number Diff line number Diff line change
Expand Up @@ -204,4 +204,10 @@ data "azurerm_key_vault_secret" "binary_cache_signing_key" {
provider = azurerm
}

data "azurerm_managed_disk" "binary_cache_caddy_state" {
name = "binary-cache-vm-caddy-state-${local.ws}"
resource_group_name = "ghaf-infra-persistent-${local.shortloc}"
}


################################################################################
76 changes: 76 additions & 0 deletions terraform/persistent/workspace-specific/main.tf
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
# SPDX-FileCopyrightText: 2024 Technology Innovation Institute (TII)
#
# SPDX-License-Identifier: Apache-2.0

provider "azurerm" {
features {}
}

terraform {
required_providers {
azurerm = {
source = "hashicorp/azurerm"
}
}
}

################################################################################

terraform {
# Backend for storing terraform state (see ../../state-storage)
backend "azurerm" {
resource_group_name = "ghaf-infra-state"
storage_account_name = "ghafinfratfstatestorage"
container_name = "ghaf-infra-tfstate-container"
key = "ghaf-infra-persistent.tfstate"
}
}

################################################################################

# Variables
variable "location" {
type = string
default = "northeurope"
description = "Azure region into which the resources will be deployed"
}
variable "persistent_resource_group" {
type = string
default = "ghaf-infra-persistent-eun"
description = "Parent resource group name"
}

locals {
# Raise an error if workspace is 'default',
# this is a workaround to missing asserts in terraform:
assert_workspace_not_default = regex(
(terraform.workspace == "default") ?
"((Force invalid regex pattern)\n\nERROR: workspace 'default' is not allowed" : "", "")

# Sanitize workspace name:
ws = substr(replace(lower(terraform.workspace), "/[^a-z0-9]/", ""), 0, 16)
}

# Data source to access persistent resource group (see ../main.tf)
data "azurerm_resource_group" "persistent" {
name = var.persistent_resource_group
}

# Current signed-in user
data "azurerm_client_config" "current" {}


################################################################################

# Resources

resource "azurerm_managed_disk" "binary_cache_caddy_state" {
name = "binary-cache-vm-caddy-state-${local.ws}"
resource_group_name = data.azurerm_resource_group.persistent.name
location = data.azurerm_resource_group.persistent.location
storage_account_type = "Standard_LRS"
create_option = "Empty"
disk_size_gb = 1
}

################################################################################
5 changes: 4 additions & 1 deletion terraform/playground/terraform-playground.sh
Original file line number Diff line number Diff line change
Expand Up @@ -133,6 +133,9 @@ main () {
fi
}

main "$@"
# Do not execute main() if this script is being sourced
if [ "${0}" = "${BASH_SOURCE[0]}" ]; then
main "$@"
fi

################################################################################
18 changes: 17 additions & 1 deletion terraform/terraform-init.sh
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,7 @@ import_bincache_sigkey () {
}

init_persistent () {
echo "[+] Initializing persistent data"
echo "[+] Initializing persistent"
# See: ./persistent
pushd "$MYDIR/persistent" >/dev/null
terraform init > /dev/null
Expand All @@ -69,6 +69,22 @@ init_persistent () {
echo "[+] Applying possible changes"
terraform apply -auto-approve >/dev/null
popd >/dev/null

# Assigns $WORKSPACE variable
# shellcheck source=/dev/null
source "$MYDIR/playground/terraform-playground.sh" &>/dev/null
generate_azure_private_workspace_name

echo "[+] Initializing workspace-specific persistent"
# See: ./persistent/workspace-specific
pushd "$MYDIR/persistent/workspace-specific" >/dev/null
terraform init > /dev/null
echo "[+] Applying possible changes"
for ws in "dev" "prod" "$WORKSPACE"; do
terraform workspace select "$ws" &>/dev/null || terraform workspace new "$ws"
terraform apply -auto-approve >/dev/null
done
popd >/dev/null
}

init_terraform () {
Expand Down

0 comments on commit 211229f

Please sign in to comment.