-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Slurm, round up num CPUs #230
Conversation
This is on the i6 cluster.. but will it break things on the apptek cluster? Or other clusters where sisyphus might be used? |
I don't see how this can break anything? In the worst case, you allocate one more CPU than what would be necessary? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree with Albert, even if we don‘t hyperthrrad (which we likely do?) CPUs don‘t cost much fairshare normally.
If we have a CPU cluster without hyper-threading and mostly use single-cpu tasks, then this updates would reduce the number of Jobs that can run in parallel by a factor of 2, would it not? |
Just out of curiosity: |
Hm, I tested this |
I don't really understand it so far. Currently, it seems to me like using |
As #231 is merged now, this here is obsolete. |
Fix #229.
Example: You have
rqmt["cpu"]==1
. Without this change, you getSLURM_CPUS_PER_TASK=1
. You also by default haveSLURM_JOB_NUM_NODES=1
. Due to the Slurm hyper-threading logic, it might round-up the num CPUs, i.e. you getNumCPUs=2
. This results inSLURM_TASKS_PER_NODE=2
.Without the additional
srun
that was introduced in #212, Slurm will not really handle theSLURM_TASKS_PER_NODE
, and this problem was not noticed. However, with thesrun
(which should always be fine and follows standard Slurm practice), this is a problem now.This PR fixes this by avoiding that
SLURM_CPUS_PER_TASK > NumCPUs
.Test this yourself: Create
slurm-test-script.sh
with content (example taken from thesbatch
man page):Run
sbatch --cpus-per-task=1 slurm-test-script.sh
. Then check the output log file.