Please list any open Torque/Moab queue issues here #147

jchodera · 2014-10-20T15:58:25Z

We're capturing a list of outstanding issues with Torque/Moab in advance of our discussion with Adaptive on Tuesday.

Please comment here with any issues (or links to issues) that are still problematic.

This is time-sensitive, so please get your issues in TODAY (Mon 20 Oct) if possible.

jchodera · 2014-10-20T15:58:58Z

@tatarsky says: Compile Torque/PBSmom to use libcgroup (instead of manual cpuset) to allow Docker to be used.

jchodera · 2014-10-20T16:03:48Z

We should ask Adaptive what they recommend in terms of computing power for Torque/Moab head.

tatarsky · 2014-10-20T17:12:33Z

Would like to move all nodes to CentOS 6.5 userspace before update. Don't see too many issues in that except controlling kernel revision and GPFS/Nvidia. They are currently 6.4.

jchodera · 2014-10-20T17:20:21Z

I do not believe our group-based fairshare system has ever worked as desired. We should revisit this issue.

Also, there is the concept that we may want to have finer-grained control over group and group collaborator weights.

jchodera · 2014-10-20T17:21:30Z

We need to tackle the setting of CUDA_VISIBLE_DEVICES for the following use cases:

single GPU
all GPUs on one node
multiple nodes, all GPUs on each node
a set of N available GPUs distributed randomly among nodes (e.g. used by MPI)

Currently, this last case is the tough one.

jchodera · 2014-10-28T18:02:59Z

Last call for any outstanding issues!

akahles · 2014-10-28T18:34:13Z

Just for completeness - not sure this is relevant: Getting the docker infrastructure to run on the cluster nodes. Maybe Adaptive has some experience there.

tatarsky · 2014-10-28T18:50:38Z

All ready on my list with needed build options.

Paul Tatarsky
[email protected]

On Oct 28, 2014, at 2:34 PM, Andre Kahles [email protected] wrote:

Just for completeness - not sure this is relevant: Getting the docker infrastructure to run on the cluster nodes. Maybe Adaptive has some experience there.

—
Reply to this email directly or view it on GitHub.

ratsch · 2014-10-28T20:47:22Z

a) As John mentioned above, I think the fairshare system is not working properly yet. That needs to be tweaked to properly take group membership into account.

b) Short (<1h) jobs in the active queue (interactive jobs) should also be able to suspend batch jobs in order to find a slot more quickly.

c) Jobs in the active queue should run as cpu overcommitment. They don't need a full CPU allocated. This takes into account that they are mostly idle.

d) The number of interactive jobs allowed per user can be increased if b) is done.

tatarsky · 2014-12-18T18:56:00Z

Folded into #197

jchodera changed the title ~~Please comment on any open Torque/Moab queue issues here~~ Please list any open Torque/Moab queue issues here Oct 20, 2014

ratsch added the torqueupdate label Nov 17, 2014

jchodera added the Torque/Moab scheduler label Dec 17, 2014

jchodera mentioned this issue Dec 17, 2014

What high priority scheduler issues remain to be resolved? #197

Closed

tatarsky closed this as completed Dec 18, 2014

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Please list any open Torque/Moab queue issues here #147

Please list any open Torque/Moab queue issues here #147

jchodera commented Oct 20, 2014

jchodera commented Oct 20, 2014

jchodera commented Oct 20, 2014

tatarsky commented Oct 20, 2014

jchodera commented Oct 20, 2014

jchodera commented Oct 20, 2014

jchodera commented Oct 28, 2014

akahles commented Oct 28, 2014

tatarsky commented Oct 28, 2014

ratsch commented Oct 28, 2014

tatarsky commented Dec 18, 2014

Please list any open Torque/Moab queue issues here #147

Please list any open Torque/Moab queue issues here #147

Comments

jchodera commented Oct 20, 2014

jchodera commented Oct 20, 2014

jchodera commented Oct 20, 2014

tatarsky commented Oct 20, 2014

jchodera commented Oct 20, 2014

jchodera commented Oct 20, 2014

jchodera commented Oct 28, 2014

akahles commented Oct 28, 2014

tatarsky commented Oct 28, 2014

ratsch commented Oct 28, 2014

tatarsky commented Dec 18, 2014