Compute Canada provides HPC infrastructure and support to every academic research institution in Canada. Compute Canada uses CVMFS, a software distribution system developed at CERN, to make the Compute Canada research software stack available on its HPC clusters, and anywhere else with internet access. This enables replication of the Compute Canada experience outside of its physical infrastructure.
From these new possibilities emerged an open-source software project named Magic Castle, which aims to recreate the Compute Canada user experience in public clouds. Magic Castle uses the open-source software Terraform and HashiCorp Language (HCL) to define the virtual machines, volumes, and networks that are required to replicate a virtual HPC infrastructure. The infrastructure definition is packaged as a Terraform module that users can customize as they require. After deployment, the user is provided with a complete HPC cluster software environment including a Slurm scheduler, a Globus Endpoint, JupyterHub, LDAP, DNS, and over 3000 research software applications compiled by experts with EasyBuild. Magic Castle is compatible with AWS, Microsoft Azure, Google Cloud, OpenStack, and OVH.
- Install Terraform (>= 0.13.4)
- Download the latest release of Magic Castle for the cloud provider you wish to use.
- Uncompress the release
- Follow the instructions
- For more details, refer to Magic Castle Documentation
This software project integrates multiple parts that come into play at different steps of spawning the cluster. The following list enumerates the steps involved in order for users to better grasp what is happening when they create clusters.
We will refer to the user of Magic Castle as the operator.
-
After downloading the latest release of the cloud provider of choice and adapting the Terraform
main.tf
file, the operator launchesterraform apply
and accepts the proposed plan. -
Terraform fetches the template hieradata yaml file from the puppet-magic_castle repo indicated by
puppetenv_git
. The version of that file corresponds to the value ofpuppetenv_rev
. This template is read by terraform and variable placeholders are replaced by the values inferred from the values prescribed inmain.tf
. -
Terraform communicates with the cloud provider REST API and requests the creation of the virtual machines.
-
For each virtual machine creation request, Magic Castle provides a cloud-init file. This file is used to initialize the virtual machine base configuration and installs puppet agent. The cloud-init file of the management node (
mgmt1
) also installs and configures a puppetmaster. -
The puppet agents communicate with the puppetmaster to retrieve and apply their configuration based on their hostnames.
- ACRC Cluster in the cloud [GCP, Oracle]
- AWS ParallelCluster [AWS]
- Elasticluster [AWS, GCP, OpenStack]
- Slurm on Google Platform [GCP]
- NVIDIA DeepOps [Ansible playbooks only]
- StackHPC Ansible Role OpenHPC [Ansible Role for OpenStack]
Refer to Magic Castle developer documentation.