Skip to content

AWS ParallelCluster 2.3.1

Compare
Choose a tag to compare
@enrico-usai enrico-usai released this 03 Apr 08:54
· 405 commits to master since this release
47b8751

We're excited to announce the release of AWS ParallelCluster 2.3.1.

Upgrade

How to upgrade?

sudo pip install --upgrade aws-parallelcluster

Enhancements

  • Add support for FSx Lustre with Amazon Linux. In case of custom AMI,
    The kernel will need to be >= 4.14.104-78.84.amzn1.x86_64
  • Slurm
    • set compute nodes to DRAIN state before removing them from cluster. This prevents the scheduler from submitting a job to a node that is being terminated.
    • dynamically adjust max cluster size based on ASG settings
    • dynamically change the number of configured FUTURE nodes based on the actual nodes that join the cluster. The max size of the cluster seen by the scheduler always matches the max capacity of the ASG.
    • process nodes added to or removed from the cluster in batches. This speeds up cluster scaling which is able to react with a delay of less than 1 minute to variations in the ASG capacity.
    • add support for job dependencies and pending reasons. The cluster won't scale up if the job cannot start due to an unsatisfied dependency.
    • set ReturnToService=1 in scheduler config in order to recover instances that were initially marked as down due to a transient issue.
  • Validate FSx parameters. Fixes #896 .

Changes

  • Slurm - Upgrade version to 18.08.6.2
  • NVIDIA - update drivers to version 418.56
  • CUDA - update toolkit to version 10.0
  • Increase default EBS volume size from 15GB to 17GB
  • Disabled updates to FSx File Systems, updates to most parameters would cause the filesystem, and all it's data, to be deleted

Bug Fixes

  • Cookbook wasn't fetched when custom_ami parameter specified in the config
  • Cfn-init is now fetched from us-east-1, this bug effected non-alinux custom ami's in regions other than us-east-1.
  • Account limit check not done for SPOT or AWS Batch Clusters
  • Account limit check fall back to master subnet. Fixes #910 .
  • Boto3 upperbound removed

Support

Need help / have a feature request?
AWS Support: https://console.aws.amazon.com/support/home
ParallelCluster Issues tracker on GitHub: https://github.com/aws/aws-parallelcluster
The HPC Forum on the AWS Forums page: https://forums.aws.amazon.com/forum.jspa?forumID=192