forked from ComputeCanada/magic_castle
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCHANGELOG
421 lines (345 loc) · 22.3 KB
/
CHANGELOG
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
# Changelog
All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/).
## [9.0] 2020-11-17
### Changed
- Upgrade to Terraform 0.13, with a new minimum requirement of 0.13.4
## [8.5] 2020-11-16
### Added
- [puppet] Added Duo MFA classes and on/off switches for management, login and compute nodes.
### Changed
- [puppet] Refactored logic identifying the presence of an NVIDIA GPU to consider memory instead of instance name.
- [puppet] Updated the version of most modules installed with Puppetfile
- [puppet] Fixed permissions of nvidia-modprobe when installing Arbutus' `nvidia-vgpu-tools` rpm.
## [8.4] 2020-11-10
### Changed
- [puppet] Updated puppet-jupyterhub version to v3.5.1
## [8.3] 2020-10-21
### Changed
- Fixed puppetenv_rev default value when creating Magic Castle release (Commit bf30e13036)
- [puppet] Bump puppet-jupyterhub version to v3.4.2
- [puppet] Fixed freeipa issue when an ip was already recorded in the DNS and a new instance was joining the realm with the same ip ([puppet-magic_castle issue #69](https://github.com/ComputeCanada/puppet-magic_castle/issues/69))
## [8.2] 2020-10-14
### Added
- [puppet] Added Cloudflare load balancer cvmfs_acl_regex (issue #64)
- [puppet] Added SELinux policy to allow fail2ban to ban using route (issue #65)
### Changed
- Fixed AWS, Azure, GCP and OVH examples that were incorrectly refering to openstack module
- [cloud-init] Bumped puppetserver to 6.13.0 and puppetagent to 6.18.0
- [puppet] Replaced homemade template of squid.conf by usage of puppet-squid module
- [puppet] Bumped puppet-jupyterhub to v3.4.1
- [puppet] Fixed slurmctld dependency to cluster registeration in slurmdbd
- [puppet] Fixed ipa_create_user password configuration
## [8.1] 2020-07-29
### Added
- Added ability to generate an ssh keypair to upload files with Terraform file provisioner.
- [puppet] Added options in hieradata to configure CVMFS repos
### Changed
- Activated VPC DNS support in AWS (issue #108)
- Fixed documentation in multiple sections (PR #92, #93, #97, #98, #101, #106)
- Fixed DNS section of AWS example (PR #108)
- [puppet] Replaced homemade template of squid.conf by usage of puppet-squid module
## [8.0] 2020-06-19
Following release of CentOS 8 2004, AWS now provides an official CentOS 8 image that
has been tested and is functional with Magic Castle 8.0.
### Added
- Added the login node ids as output of the main Magic Castle Terraform module.
- Added a trigger to DNS module deploy_certs based on login node ids. If there is a modification to one of the login node state,
the certificates will be uploaded to the corresponding login node, without having to taint the `deploy_certs` resource manually
(PR #88).
- Added try function around access to index 0 of resource array to limit errors when destroying resources.
- [puppet] Added a resource in `profile::base` to remove terraform `local-exec` leftover empty scripts in /tmp.
### Changed
- [puppet] Id of the accounts created in FreeIPA now start at UID_MAX defined `/etc/login.defs`. (commonly 60000 instead of 50000)
- [puppet] fail2ban configuration is now done with puppet-fail2ban module. The `sshd` jail is now named `ssh-route`.
- [cloud-init] Bumped puppetserver to 6.12.0 and puppetagent to 6.16.0.
- Puppet hieradata yaml files are now uploaded with Terraform file provisioner instead of being embedded in mgmt1 userdata.
This means a change to the number of users, the guest password, or the hieradata variable no longer trigger a rebuild of mgmt1
but only a reupload of YAML files (PR #89)
- [docs] Various fixes (Issues #87, #92, #93)
### Removed
- Hieradata has been removed from puppetmaster.yaml template.
## [7.3] 2020-06-04
This release introduces three main features:
- Add support for Slurm 20
- Add support for CentOS 8. Tested functional on GCP and OpenStack. AWS and Azure do not provide
an official CentOS 8 image with cloud-init support at the moment of this release.
- Add support for Compute Canada Arbutus Cloud NVIDIA VGPUs (flavor `vgpu-...`).
### Changed
- Improved main documentation.
- [AWS] Most resources if not all now have the name of the cluster as a prefix in their name
- [OpenStack] Simplified volume attachment count computation
- [puppet] Slurm plugin spank-cc-tmpfs_mounts is now installed from copr yumrepo
- [puppet] Fixed order of slurm packages install
- [puppet] Exec resource in charge of creating the slurm cluster in slurmdbd now returns 0 if the cluster already exists
- [puppet] `consul-template` class initialization is now entirely in hieradata file `common.yaml`.
- [puppet] CentOS 8 support: replaced notification of `nfs-idmap.service` by notification of `nfs-server.service`.
- [puppet] CentOS 8 support: replaced `pdsh` by `clustershell`
- [puppet] CentOS 8 support: rpc_nfs_args is now only defined if os is CentOS 7.
- [puppet] CentOS 8 support: `ipa_create_user.py` now use `/usr/libexec/platform-python` instead of `/usr/bin/env python`.
- [puppet] CentOS 8 support: Replaced Python 2 `unicode` calls in `ipa_create_user.py` by six's `text_type`
- [puppet] CentOS 8 support: Moved list of nvidia package names from class profile::gpu to hieradata. List now depends on CentOS version.
- [puppet] CentOS 8 support: Moved FreeIPA `regen_cert_cmd` value to hieradata. Command now depends on CentOS version.
- [puppet] Bumped puppet-jupyterhub version to 3.3.2
- [puppet] Update nvidia driver fact to make sure at most one version is in the output
- [puppet] Changed logic of `nvidia_grid_vgpu` fact to just check if the instance flavor includes `vgpu` in its name
- [puppet] CentOS 8 support: Moved default loaded CVMFS modules to hieradata. Module list now depends on CentOS version
- [puppet] CentOS 8 support: Fixed nfs clean rbind execstop warning
- [puppet] Replaced tcp_con_validator to check if slurmdbd is running by a wait_for ressource on slurmdbd.log regex
- [puppet] CentOS 8 support: Fixed package name in nvidia-driver-version fact.
- [cloud-init] Replaced `reboot -n` in `runcmd` by `power_state` with reboot now. This makes sure final stage of cloud-init is applied before reboot.
- [gcp] CentOS 8 support: rewrote `install_cloudinit.sh` to avoid network issue at boot and install cloud-init only for the time needed. (issue #85)
### Added
- [puppet] Added support for CentOS 8 when selecting Slurm yumrepo
- [puppet] Slurm 20 support: Added `slurm_version` variable to hieradata. It can be either 19 or 20.
- [puppet] Slurm 20 support: Added PlugStackConfig parameter to slurm.conf
- [puppet] Added slurm-perlapi package to `profile::base::slurm`
- [puppet] Added exec to initialize cvmfs default.local with consul-template.
- [puppet] Added a default node1 in slurm.conf when no slurmd has been registered yet in consul
- [puppet] Added a require on Epel yumrepo for package fail2ban-server
- [puppet] Added class profile::fail2ban::install
- [puppet] CentOS 8 support: Added dependency on puppet-epel to install epel yumrepo
- [puppet] CentOS 8 support: Enabled powertools repo
- [puppet] CentOS 8 support: Enabled idm:DL1 stream
- [puppet] CentOS 8 support: Added network-scripts package when os is CentOS 8
- [puppet] CentOS 8 support: Added munge_socket selinux policy to allow confined user to submit jobs
- [puppet] Added class `profile::gpu::install`
- [puppet] Added a requirement on epel yumrepo for singularity package.
- [puppet] Added a requirement for slurm exec `create_account` on slurm exec `add_cluster`
- [puppet] CentOS 8 support: added class `profile::mail::server`
- [puppet] Added a requirement on yumrepo epel to class `jupyterhub` in `profile::jupyterhub::hub`
- [puppet] Added support for Compute Canada Arbutus Cloud VGPUs
### Removed
- [puppet] Removed notify to `slurmctld` from slurm::accounting exec `add_cluster`.
## [7.2] 2020-05-20
### Changed
- Reverted type of image variable from string to any because Azure image input is a map.
## [7.1] 2020-05-20
### Changed
- [GCP] Fixed a typo in disk paths that prevented creation of project and scratch volume
- [GCP] Increased the root disk size in the example to 20GB. This is the new minimum for centos7 image.
- Bumped minimum requirements to 0.12.21 in all versions.tf files.
- [puppet] Bumped most module versions to latest in Puppetfile
- [puppet] Bumped consul and consul-template version to latest available
### Added
- Documentation on variables specific to the commercial cloud providers
- Documentation on hieradata
- Documentation on firewall_rules
- Description and types to all terraform variables
## [7.0] 2020-05-18
### Changed
- Established a distinction in variables between puppetmaster and mgmt1 - allowing puppetmaster role to be assigned to another instance.
- Bumped minimum requirement of terraform to 0.12.21 (issue #77)
- Numerous doc fixes
- Added a section on related projects in README.md
- [Azure] Updated Azure infrastructure.tf to use Azure provider 2.0.0 (issue #62)
- [cloud-init] Set puppet-agent and puppet-server version to 6.13 and 6.9
- [cloud-init] Renamed cloud-init YAML files to `puppetagent.yaml` and `puppetmaster.yaml`
- [OpenStack] Fixed volume size computation regression introduced in commit c09ea17d
- [puppet] Defined selinux context for /scratch as home_root_dir
- [puppet] Defined selinux context for /project as home_root_dir
- [puppet] Improved cuda facts to avoid issue when html index is incomplete
- [puppet] Updated package names in gpu module and facts
- [puppet] Generalized gpu module cuda repo link composition
- [puppet] Replaced package by ensure_packages for kernel-devel in gpu
- [puppet] Updated version of puppet-jupyterhub to v3.3.0
- [puppet] Improved FreeIPA client installation waiting conditions to limit failure
- [puppet] Disabled root jobs in slurm.conf]
- [puppet] Added nosuid to client nfs mount options
- [puppet] Activated root_squash for all nfs exports
- [puppet] Changed URL for the source of `cc-tmpfs_mount.so`
- [puppet] Updated derdanne/nfs version in Puppetfile
- [puppet] Made profile::base a requirement of profile::nfs::server
- [puppet] Defined servername param for apache in reverse_proxy
### Added
- [Azure] Added variable to allow usage of an existing resource group based on its name (issue #72)
- [cloud-init] Enabled puppet agent postrun command in cloud-init
- [puppet] mgmt1 volumes formating is now handled by `profile::nfs::server` class
- [puppet] Added logic to define, mount and format nfs shared volumes with lvm
- [puppet] Added README.md
- [puppet] Fixed regression introduced in 630a04
- [puppet] Added possibility to manage jail activation and ignore_ip with hierada
- [puppet] Added profile classes for JupyterHub: `profile::jupyterhub::node` and `profile::jupyterhub::hub`
- [puppet] Added variable to allow definition of lmod default modules with hieradata
- [puppet] Configured lmod default modules to start with gcc and openmpi
- [puppet] Added ability to receive last puppet run output by email through puppet postrun script
- [puppet] Added support for NVIDIA GRID vGPU
- [puppet] Added class `profile::base::azure` for logic specific to Azure
### Removed
- [cloud-init] Removed volumes formating, partitioning and mounting from mgmt cloud-init
- [puppet] Removed condition on gpu count in nvidia_driver_vers
- [puppet] Removed mkhomedir from FreeIPA client installation parameters
## [6.4] - 2020-03-11
### Changed
- [cloud-init] Hardcoded the version of puppet-agent (6.13.0) and puppetserver (6.9.1).
## [6.3] - 2020-03-09
### Added
- Added random_uuid to generate a random consul token
- [travis] Added init and validation of dns/gcloud module
- [cloud-init] Added bootstrap installation of consul-server in cloud-init
- [puppet] Added slurmd restart when node is missing from sinfo
- [puppet] Introduced class `profile::workshop::mgmt`. The class allow to unzip an archive in all guest accounts
- [puppet] Added profile::workshop::mgmt to mgmt in site.pp
- [puppet] Defined consul::service for slurmd, slurmctld slurmdbd, rsyslog, cvmfs client, and squid. This in conjunction
with consul-template, allow these services to be removed from the config files when the instance that was running the
service is halted. For example, if a compute node is shutdown or remove, it will no longer appear in `sinfo` output.
### Changed
- [cloud-init] Turned off puppet agent reporting in cloud-init
- [cloud-init] [puppet] Renamed user_hieradata as user_data
- [cloud-init] Volume formating and mounting is now conditional on the hostname being `mgmt1`
- [OpenStack] Fixed port_node resource name template
- [puppet] Updated puppet-jupyterhub version to v1.8.1
- [puppet] Consul and consul-template version are now defined in hieradata
- [puppet] Changed node_exported consul service name to node-exported to remove warning
### Removed
- [puppet] Removed unused key from terraform_data
- [puppet] Removed stage in mgmt site.pp
## [6.2] - 2020-03-01
### Added
- Added an error message in cloud-init dev avail while loops
- Added gcloud dns module to AWS, Azure and OVH examples.
- [puppet] Added a slurmd restart when node hostname is missing from sinfo output.
- [puppet] Added class profile::workshop::mgmt to deploy files to guest user homes
- [puppet] Added class profile::workshop::mgmt to mgmt1 in site.pp
### Changed
- [OpenStack] All resources, including instances, have now a name that starts with the cluster name.
This does not affect the instances' hostname
- [puppet] Update puppet-jupyterhub version to v1.7
## [6.1] - 2020-02-27
Fix travis release procedure. 6.0 release bundles contained the wrong module source in main.tf.
## [6.0] - 2020-02-26
Terraform >= 0.12.21 is now required. Usage of the function `subtract` requires at least 0.12.21.
### Added
- Added the optional key `prefix` to the `instance["node"]` map (issue #29)
- [cloud-init] Added removal of ifcfg file with no corresponding nic (issue #61)
- [puppet] Added optional prefix to node regex in site.pp
### Changed
- `instance["node"]` is now a list. This allows the spawning of compute nodes with various instance types (issue #29)
- release.sh is now the only script for creating a release on any platform.
- [Azure] Renamed azurerm_virtual_machine nodevm to node (issue #55)
- [AWS] Replaced aws volume device name by volume id (issue #60)
- [gcp] Renamed gcp var.project_name to var.project (issue #53)
- [puppet] Upgraded puppet-prometheus to 8.2.1
- [puppet] Remove Name=gpu from gres.conf template ([puppet-magic_castle issue #27](https://github.com/ComputeCanada/puppet-magic_castle/issues/27))
## [5.8] - 2020-02-24
### Added
- [Azure] Added `root_disk_size=30` in the example (issue #43)
- [Azure] Added ssh_keys to instances as it is mandatory (issue #44)
- [cloud-init] Added volume attachment verification loops in mgmt cloud-init (issue #54)
- [GCP] Added gcloud dns module to gcp example (issue #37)
- [GCP] Added prefix to the name of volumes and ipv4 (issue #49)
- [OpenStack] Added os_int_subnet variable. The variable is used to force to use a specific subnet with Openstack.
### Changed
- Changes to image variable are now ignored after cluster is built.
- Fixed release scripts to solve bug where multiple `version` variable were present (issue #38)
- [Azure] Updated the example to use the most recent OpenLogic CentOS 7 image (issue #42)
- [Azure] Resources names are now prefixed with the cluster name
- [Azure] Azure public_ip now outputs a list of all login ip addresses
- [cloud-init] Replaced timezone in cloud-init to UTC (issue #51)
- [GCP] Zone variable is now optional. The zone is randomly selected in the zones available for the region if left blank.
- [GCP] Instances internal DNS are now configured to use zonal DNS. The internal DNS hostname is not used, but the change reduced
the DHCP time lease from 24 to 1 hour. This helps when debugging DHCP issue
- [puppet] CC CVMFS repo is now configured from latest RPM repo (issue [puppet-magic_castle issue #19](https://github.com/ComputeCanada/puppet-magic_castle/pull/19))
- [puppet] Increased the squid maximum_object_size ([puppet-magic_castle issue #20](https://github.com/ComputeCanada/puppet-magic_castle/issues/20))
- [puppet] Updated globus rpm repo name
- [puppet] Updated fail2ban config to make it work with 0.10.x ([puppet-magic_castle issue #25](https://github.com/ComputeCanada/puppet-magic_castle/issues/25))
- [puppet] Replaced file by exec to create singularity symlink ([puppet-magic_castle issue #24](https://github.com/ComputeCanada/puppet-magic_castle/issues/24))
- [puppet] Replaced timezone in cloud-init to UTC ([puppet-magic_castle issue #26](https://github.com/ComputeCanada/puppet-magic_castle/issues/26))
- [puppet] Added service network and notify on NetworkManager purge (issue #50)
### Removed
- [GCP] Zone variable is no longer in the example as it is now optional.
## [5.7] - 2020-01-16
### Added
- [AWS] Added `skip_destroy = True` to EBS attachment resources to avoid stalling destroy command.
- [DNS] Added a `dtn` entry for the Globus endpoint.
- [DNS] Added an `ipa` entry that provides access to FreeIPA webpage.
- [puppet] Added a `profile::reverse_proxy` class that configure Apache vhost for JupyterHub, FreeIPA, Globus, etc.
- [puppet] Added service nvidia-persistenced to module gpu.pp.
- [puppet] Added `drain` to states that spawns an scontrol in slurm module.
- [main.tf] Added `hieradata` variable that allow the injection of custom values in puppet hieradata from Terraform.
### Changed
- [AWS] Changed mgmt and login instances from using `associate_public_ip_address` to using an AWS Elastic IP.
- [AWS] Updated example AMI and minimum instance type for mgmt.
- [AWS] Fixed module's syntax for Terraform 0.12.
- [AWS] Made `availability_zone` optional. If zone is not provided, it will be randomly selected amongst the zones available for the selected region.
- [AWS] Changed root disk type from `standard` to `gp2`.
- [AWS] Enabled ebs_optimized for all instances.
- [AWS] Changed SSH keyname from `slurm-cloud-key` to `${cluster_name}-key`
- [cloud-init] Made puppet yumrepo install function of the CentOS major version.
- [cloud-init] Added blacklisting of nouveau driver in kernel cmdline option.
- [DNS] DNS records are now produced by the `record_generator` module instead of listing records in each DNS provider module.
- [puppet] Changed Globus authentication method from MyProxy to OAuth.
- [puppet] Updated puppet-jupyterhub version from v1.1 to v1.6.
- [puppet] Replaced deprecated package name dkms-nvidia by kmod-nvidia-latest-dkms.
- [puppet] Replaced every reference to facts of `eth0` by facts of interface index 0.
- [puppet] Disabled dkms autoinstall timeout in gpu.
### Removed
- [puppet] Removed include of jupyterhub::reverse_proxy in site.pp for login.
## [5.6] - 2019-11-27
### Added
- [DNS] Added support for Google Cloud DNS (PR #24)
- Added a release script compatible with BSD tools - `release.bsd.sh`
### Changed
- [DNS] Changed the login record A pattern from `clustername#.domain` to `login#.clustername.domain` where `#` is the login node index.
- [DNS] Moved wildcard certificate creation from cloudflare to an acme module shared by all dns modules.
- [DNS] Replaced usage of 0-index on array by call to `distinct` (issue #26).
- [DNS] A `jupyter.${cluster_name}.${domain_name}` record is now added for each login instead of just login1.
- Changed the management and login node naming scheme to match node naming. `mgmt01` is now `mgmt1` and `login01` is now `login1`.
- [puppet] Fix puppet-jupyterhub version to v1.1 instead of master branch.
### Removed
- [DNS] Removed creation of SSHFP SHA1 records (issue #22)
## [5.5] - 2019-11-15
### Added
- [docs] Added details on how to use CloudFlare API Token in README.md
- [docs] Added details on which Open RC File to download when using OpenStack
- [DNS] Added an email variable to the dns module.
- [puppet] Added logic to set RSNT_ARCH variable based on the common CPU
instruction extensions available amongst all CVMFS clients using consul.
## [5.4] - 2019-10-23
### Fixed
- [OpenStack] Fix image_id condition for login and node
- [CloudFlare] Update to CloudFlare provider 2.0 and pinned the version in dns.tf.
## [5.3] - 2019-09-30
### Added
- [puppet] Added hbac rule to limit access to mgmt instances from IPA users (issue #13)
- [puppet] Added token based ACL to consul to limit access to the key-value store (issue #11)
### Changed
- [puppet] Activate selinuxuser_tcp_server boolean to allow confined users to run MPI jobs (issue #12)
- [puppet] FreeIPA OTP activation command is now subscribed to server install exec
### Fixed
- [OpenStack] Set the image_id to null when the root disk is a volume to avoid false detection of change by Terraform
## [5.2] - 2019-09-19
### Added
- Travis CI automated testing for Terraform and Puppet files
- Travis CI build status is now at the beginning of README.md
- [docs] Added contribution section to README.md
- [OVH] Added a Terraform file for network related ressources
- [OpenStack] Added a Terraform file for network related ressources
- [cloud-init] Added removal of firewalld package in mgmt.yaml
- [docs] Added docs on building SELinux enabled image for OVH cloud
- [OpenStack] Added the definition of a root disk block device with condition on flavor choice and root disk size
- [GCP] Added the ability to set the boot disk size with `var.root_disk_size`
- [Azure] Added the ability to set the os storage volume size with `var.root_disk_size`
- [AWS] Added the ability to set the root disk size with `var.root_disk_size`
- [docs] Added documentation on variable `root_disk_size`
- [puppet] Added myproxy and gridftp as service in globus module
### Changed
- Release script now takes only one version number
- Release script now fixes the Terraform providers' version
- [docs] Updated globus documentation
- [docs] Renamed developpers docs
- [OVH] OVH infrastructure file is now a symlink to the OpenStack infrastructure file
- [puppet] Globus is now installed only if globus_user/password is defined in the hieradata
- [puppet] Globus endpoint name is now set based on the domain name instead of the instance hostname
- [puppet] FreeIPA ip address parameter on mgmt is now fixed to eth0 interface
- [puppet] Firewall chains and rules that were not set by puppet are now purged
- [puppet] Configure network, netmask and ip to always map to eth0 interface
- [puppet] Configure slurmctld ip address to eth0 to fix issue when there more than one NIC
### Removed
- Removed version number from documentation
### Fixed
- [AWS] Fixed dependency cycle with volumes
- [AWS] Fixed firewall definition
- [AWS] Fixed dependency cycle with mgmt ip addresses
- [OVH] Fixed dependency cycle
- [docs] Fixed typos and rfc links