Skip to content

AWS ParallelCluster v2.6.0

Compare
Choose a tag to compare
@lukeseawalker lukeseawalker released this 26 Feb 20:46
· 31 commits to release-2.6 since this release

We're excited to announce the release of AWS ParallelCluster 2.6.0.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

Enhancements

  • Add support for Amazon Linux 2
  • Add support for NICE DCV on Ubuntu 18.04
  • Add support for FSx Lustre on Ubuntu 18.04 and Ubuntu 16.04
  • New CloudWatch logging capability to collect cluster and job scheduler logs to CloudWatch for cluster monitoring and inspection
    • Add --keep-logs flag to pcluster delete command to preserve logs at cluster deletion
  • Install and setup Amazon Time Sync on all OSs
  • Enable accounting plugin in Slurm for all OSes. Note: accounting is not enabled nor configured by default
  • Add retry on throttling from CloudFormation API, happening when several compute nodes are being bootstrapped
    concurrently
  • Display detailed substack failures when pcluster create fails due to a substack error
  • Create additional EFS mount target in the AZ of compute subnet, if needed
  • Add validator for FSx Lustre Weekly Maintenance Start Time parameter
  • Add validator to the KMS key provided for EBS, FSx, and EFS
  • Add validator for S3 external resource
  • Support two new FSx Lustre features, Scratch 2 and Persistent filesystems
    • Add two new parameters deployment_type and per_unit_storage_throughput to the fsx section
    • Add new storage sizes storage_capacity, 1,200 GiB, 2,400 GiB and multiples of 2,400 are supported with SCRATCH_2
    • In transit encryption is available via fsx_kms_key_id parameter when deployment_type = PERSISTENT_1
    • New parameter per_unit_storage_throughput is available when deployment_type = PERSISTENT_1

Changes

  • Upgrade Slurm to version 19.05.5
  • Upgrade Intel MPI to version U6
  • Upgrade EFA installer to version 1.8.3:
    • Kernel module: efa-1.5.1 (updated from efa-1.4.1)
    • RDMA core: rdma-core-25.0 (distributed only) (no change)
    • Libfabric: libfabric-aws-1.9.0amzn1.1 (updated from libfabric-aws-1.8.1amzn1.3)
    • Open MPI: openmpi40-aws-4.0.2 (no change)
  • Install Python 2.7.17 on CentOS 6 and set it as default through pyenv
  • Install Ganglia from repository on Amazon Linux, Amazon Linux 2, CentOS 6 and CentOS 7
  • Disable StrictHostKeyChecking for SSH client when target host is inside cluster VPC for all OSs except CentOS 6
  • Pin Intel Python 2 and Intel Python 3 to version 2019.4
  • Automatically disable ptrace protection on Ubuntu 18.04 and Ubuntu 16.04 compute nodes when EFA is enabled.
    This is required in order to use local memory for interprocess communications in Libfabric provider
    as mentioned here: https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/efa-start.html#efa-start-ptrace
  • Packer version >= 1.4.0 is required for AMI creation
  • Use version 5.2 of PyYAML for python 3 versions of 3.4 or earlier.

Bug Fixes

  • Fix issue with slurmd daemon not being restarted correctly when a compute node is rebooted
  • Fix errors causing Torque not able to locate jobs, setting server_name to fqdn on master node
  • Fix Torque issue that was limiting the max number of running jobs to the max size of the cluster
  • Fix OS validation depending on the configured scheduler

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192