-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Please list any open Torque/Moab queue issues here #147
Comments
@tatarsky says: Compile Torque/PBSmom to use libcgroup (instead of manual cpuset) to allow Docker to be used. |
We should ask Adaptive what they recommend in terms of computing power for Torque/Moab head. |
Would like to move all nodes to CentOS 6.5 userspace before update. Don't see too many issues in that except controlling kernel revision and GPFS/Nvidia. They are currently 6.4. |
I do not believe our group-based fairshare system has ever worked as desired. We should revisit this issue. Also, there is the concept that we may want to have finer-grained control over group and group collaborator weights. |
We need to tackle the setting of
Currently, this last case is the tough one. |
Last call for any outstanding issues! |
Just for completeness - not sure this is relevant: Getting the docker infrastructure to run on the cluster nodes. Maybe Adaptive has some experience there. |
All ready on my list with needed build options. Paul Tatarsky
|
a) As John mentioned above, I think the fairshare system is not working properly yet. That needs to be tweaked to properly take group membership into account. b) Short (<1h) jobs in the active queue (interactive jobs) should also be able to suspend batch jobs in order to find a slot more quickly. c) Jobs in the active queue should run as cpu overcommitment. They don't need a full CPU allocated. This takes into account that they are mostly idle. d) The number of interactive jobs allowed per user can be increased if b) is done. |
Folded into #197 |
We're capturing a list of outstanding issues with Torque/Moab in advance of our discussion with Adaptive on Tuesday.
Please comment here with any issues (or links to issues) that are still problematic.
This is time-sensitive, so please get your issues in TODAY (Mon 20 Oct) if possible.
The text was updated successfully, but these errors were encountered: