-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Merge azure-images to main #74
Merged
henrirosten
merged 88 commits into
tiiuae:main
from
henrirosten:merge-azure-images-to-main
Feb 9, 2024
Merged
Merge azure-images to main #74
henrirosten
merged 88 commits into
tiiuae:main
from
henrirosten:merge-azure-images-to-main
Feb 9, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is used by the nix_build.sh script used to build images with terraform. Signed-off-by: Florian Klink <[email protected]>
This introduces a terraform module that can be used to nix-build and upload VM images to Azure. nix-build.sh originates from https://cs.tvl.fyi/depot/-/blob/ops/terraform/deploy-nixos/nixos-eval.sh, which is why it inherits its copyright from there. Signed-off-by: Florian Klink <[email protected]>
This groups some common together some resources to create a VM. We might introduce more flexibility at a later point. Signed-off-by: Florian Klink <[email protected]>
We can just include azure-config.nix from nixpkgs. It pulls in azure- common.nix, which contains all necessary kernel config / udev rules. It also defines a `config.system.azureImage` attribute, which builds a vhd that we can import into azure, using the `azurerm-nix-vm-image` terraform module These can be referred to from source_image_id in Terraform (using azurerm-linux-vm for example), allowing to boot the desired machine config out of the box, without having to do a two-staged-deploy. Signed-off-by: Florian Klink <[email protected]>
This allows injecting custom userdata to the VM at instance creation time, which we can use to provision some config (like SSH pubkey config) that's not part of the NixOS image. Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
azure-common.nix already sets services.openssh.settings.{PermitRootLogin,ClientAliveInterval}, so we need to decide what wins. To keep the intended behaviour, we want to mkForce PermitRootLogin to "no" (azure-common.nix sets "prohibit-password"), and set the ClientAliveInterval with mkDefault - bumping that timeout probably makes sense for azure, and we don't want the setting in this file to take priority. Signed-off-by: Florian Klink <[email protected]>
This file contains all ssh public keys used by real humans. It's parsed from Terraform to inject into instance metadata. Signed-off-by: Florian Klink <[email protected]>
This builds the jenkins-master Nix image, turns it into a bootable Azure image, and then boots an instance with the image. Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
That way, the VM survives reboots - the non-networkd configuration seems to be quite brittle. Signed-off-by: Florian Klink <[email protected]>
Ideally, we'd keep systemd-resolved disabled too, but the way nixpkgs configures cloud-init prevents it from picking up DNS settings from elsewhere. Signed-off-by: Florian Klink <[email protected]>
Move the azure-specific config snipped into its own file, so we can import it from multiple configuration.nix. azure-common.nix is already used for the existing machine configurations, and as we don't want to break these, it's using this transient name. Signed-off-by: Florian Klink <[email protected]>
This gives each VM a system-assigned identity, and exposes the principal ID as a module output, allowing to grant access to certain resources. Signed-off-by: Florian Klink <[email protected]>
This exposes a read-only HTTP webserver for the contents in the storage container. `rclone serve http` takes care of exposing the storage container over HTTP. We disallow listing (by only allowing access to certain paths), and expose it over HTTP(S) with auto-ssl via caddy. This will work with whatever domain we route to it, so it's not part of the configuration. Signed-off-by: Florian Klink <[email protected]>
This works around NixOS/nixpkgs#272532, we can revert this once NixOS/nixpkgs#272617 has landed here. Signed-off-by: Florian Klink <[email protected]>
We don't want to blindly issue certs for all domains, but make this configurable. This should be config coming from the environment, via cloud-init. Signed-off-by: Florian Klink <[email protected]>
Define this for each machine outside the VM, and describe everything in a single security group. Attaching multiple security groups caused confusing duplicate errors, this might be a Terraform Azure Provider Bug. Signed-off-by: Florian Klink <[email protected]>
This adds filesystem-related tools to the $PATH of cloud-init, so it can format disks with its disk_setup module (and fs_setup) config key. This will be used to format data volumes attached to VMs. Signed-off-by: Florian Klink <[email protected]>
We need to use cloud-init to format and mount data volumes in azure, we can't use systemd for it. Due to hashicorp/terraform-provider-azurerm#6117, disks in Azure gets attached late at boot, so any dev-disk-by-….device units created via systemd-fstab-generator might not exist yet at the time the graph for multi-user.target is created, causing systemd to fail starting downstream services due to a missing dependency. Once the volume is attached, the .device unit pops up via udev, and then a manual restart of services depending on data disks would work, but it's messy. Letting cloud-init take care of data disk mounting (and formatting) is the right choice, that way systemd doesn't need to do any dependency tracking of it. Signed-off-by: Florian Klink <[email protected]>
This adds the ghafbinarycache storage account, and a binary-cache-v1 storage container inside of it. It's used to serve artifacts from (via the binary-cache) VM, and Nix build artifacts are also uploaded to it. Signed-off-by: Florian Klink <[email protected]>
This deploys the VM defined at binary-cache. Attaching the data disks is still a bit messy (requires one reboot, or manual reverse proxy restart). Fixing this requires some more debugging. Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Florian Klink <[email protected]>
The service-binary-cache module is all the specific hosts need. Signed-off-by: Florian Klink <[email protected]>
Otherwise, cloud-init.service might still be running while we start up services expecting the mount to happen. Signed-off-by: Florian Klink <[email protected]>
Configure the domain and storage account name with cloud-init. This allows keeping the same NixOS image across multiple deployments of this image, serving another bucket at another domain. Also, switch to listening on port 443 only, caddy can use the TLS-ALPN-01 challenge just fine. Signed-off-by: Florian Klink <[email protected]>
This should use tls-alpn-01 on port 443 just fine. Signed-off-by: Florian Klink <[email protected]>
Apparently canonical/cloud-init#4673 and more hacks are not needed, we can simply ramp up the timeout that systemd is willing to wait for the .device unit to appear. Signed-off-by: Florian Klink <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
Fix cloud-config startup by adding a dependency to mnt-resource.mount Signed-off-by: Henri Rosten <[email protected]>
This reverts commit c083e6b.
Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
Move binary cache signing key to its own resource group, this makes it possible to share the signing key between the private development environments. Signed-off-by: Henri Rosten <[email protected]>
Move ssh private key to azure-secrets resource group, similarly to how binary cache signing key was moved in the previous commit. Signed-off-by: Henri Rosten <[email protected]>
Remove functions `delete_keyvault` and `import_sigkey` which are no longer needed after the two previous commits, that move the builder ssh private key and the binary cache signing key to their own resource group. These secrets now persist even after workspace destruction, so there's no need to generate or delete them separately outside terraform. Signed-off-by: Henri Rosten <[email protected]>
Do not automatically switch to default workspace after `destroy` command. Improve workspace name matching by not allowing partial matches. Signed-off-by: Henri Rosten <[email protected]>
Associate each builder VM to correct network security group. Before this change, builders were bound to binary cache's security group. Signed-off-by: Henri Rosten <[email protected]>
Run jenkins service only after cloud-config. This is an attempt to fix the occasional jenkins service startup failures. Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
henrirosten
force-pushed
the
merge-azure-images-to-main
branch
from
February 7, 2024 05:48
9ef5bf1
to
98b418e
Compare
Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
This reverts 3ba044e moving the nixpkgs revision back to b0b2c5445c64191fd8d0b31f2b1a34e45a64547d from 23.11 which is the same nixpkgs version as what was used in main-branch already. Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
Move azure nix host configurations to their own subdirectory to avoid confusion with the ficolo (e.g. 'binarycache') and azure ('binary-cache') nix configurations. Signed-off-by: Henri Rosten <[email protected]>
Signed-off-by: Henri Rosten <[email protected]>
henrirosten
force-pushed
the
merge-azure-images-to-main
branch
from
February 7, 2024 06:21
98b418e
to
b799c59
Compare
Move caddy state disk to persistent. Binary-cache vm stores let's encrypt certificates and data on the caddy state disk. This state disk needs to be stored in 'persistent' data, otherwise there will be issues with certificate authority rate limits when development environments are deployed and consequently destroyed. Signed-off-by: Henri Rosten <[email protected]>
mnokka
approved these changes
Feb 9, 2024
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Azure_image branch worked on in playground tests, go ahead
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Introduce changes from the azure-images branch to the main branch:
After this change, it's possible to spin-up a ghaf-infra instance with terraform following the instructions from README.md. We verified ghaf x86 targets (both native and cross-compiled) can be build with an example dev-instance when manually triggered over ssh on the jenkins-controller VM.