Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[not urgent] GPUs are not set to appropriate thread mode by Torque #275

Open
jchodera opened this issue Jun 21, 2015 · 13 comments
Open

[not urgent] GPUs are not set to appropriate thread mode by Torque #275

jchodera opened this issue Jun 21, 2015 · 13 comments

Comments

@jchodera
Copy link
Member

Here's another Torque bug:

When I request GPU processes across any nodes with

#PBS -l procs=8,gpus=1:shared

the GPU mode (here, shared) is not being set correctly on the GPUs allocated.

From $PBS_GPUFILE (for job 3671532), we have:

gpu-1-11-gpu3
gpu-1-11-gpu2
gpu-1-11-gpu1
gpu-1-11-gpu0
gpu-2-9-gpu3
gpu-2-9-gpu2
gpu-2-9-gpu1
gpu-2-9-gpu0

but if you go to gpu-1-11 and run nvidia-smi, you see that the GPUs are still in thread-exclusive mode:

[chodera@gpu-1-11 ~]$ nvidia-smi
Sun Jun 21 14:56:54 2015       
+------------------------------------------------------+                       
| NVIDIA-SMI 346.46     Driver Version: 346.46         |                       
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  GeForce GTX 680     Off  | 0000:03:00.0     N/A |                  N/A |
| 30%   38C    P0    N/A /  N/A |     62MiB /  4095MiB |     N/A    E. Thread |
+-------------------------------+----------------------+----------------------+
|   1  GeForce GTX 680     Off  | 0000:04:00.0     N/A |                  N/A |
| 30%   36C    P0    N/A /  N/A |     62MiB /  4095MiB |     N/A    E. Thread |
+-------------------------------+----------------------+----------------------+
|   2  GeForce GTX 680     Off  | 0000:83:00.0     N/A |                  N/A |
| 30%   35C    P0    N/A /  N/A |     62MiB /  4095MiB |     N/A    E. Thread |
+-------------------------------+----------------------+----------------------+
|   3  GeForce GTX 680     Off  | 0000:84:00.0     N/A |                  N/A |
| 30%   38C    P0    N/A /  N/A |     62MiB /  4095MiB |     N/A    E. Thread |
+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID  Type  Process name                               Usage      |
|=============================================================================|
|    0            C+G   Not Supported                                         |
|    1            C+G   Not Supported                                         |
|    2            C+G   Not Supported                                         |
|    3            C+G   Not Supported                                         |
+-----------------------------------------------------------------------------+
@jchodera
Copy link
Member Author

This appears to only be the case for requesting via the syntax

#PBS -l procs=8,gpus=1:shared

Jobs requested in blocks of nodes work fine:

#PBS -l nodes=2:ppn=4:gpus=4:shared

@jchodera
Copy link
Member Author

This is obviously not urgent either. Just something to file a bug report for at some point.

@jchodera jchodera changed the title GPUs are not set to appropriate thread mode by Torque [not urgent] GPUs are not set to appropriate thread mode by Torque Jun 21, 2015
@tatarsky
Copy link
Contributor

Noted.

@tatarsky
Copy link
Contributor

Attempting to enter into Adaptive ticket system...

@tatarsky
Copy link
Contributor

Forgot to enter case number for reference: TRQ# 22388

@tatarsky
Copy link
Contributor

Support is checking with developers on the above. No other status.

@tatarsky
Copy link
Contributor

tatarsky commented Jul 2, 2015

Pinging support for status...

@tatarsky tatarsky self-assigned this Jul 2, 2015
@tatarsky
Copy link
Contributor

tatarsky commented Jul 2, 2015

Adaptive confirmed this is a bug.

@tatarsky
Copy link
Contributor

tatarsky commented Aug 6, 2015

No status on bug. Monthly check.

@marcoverl
Copy link

tatarsky, sorry for using this tool maybe not appropriate for my question, but i think you can help me: any idea how cloud i set the NVML COMPUTE mode with Torque/Maui when using qsub -W x='GRES:gpu at 1' as from the famous Maui patch?

@tatarsky
Copy link
Contributor

tatarsky commented Oct 1, 2015

Sorry, is this in the Hal cluster? We don't use Maui and my knowledge of it is slight in terms of it it supports the GRES concept. I take it the above doesn't work.

@marcoverl
Copy link

no, i am working on this project: https://wiki.egi.eu/wiki/GPGPU-CREAM and i was googleing around to find a way to do the same things for GPUs that are supported by Torque/pbs_sched with Torque/Maui too. I know that GPU support is in Moab and not foreseen in Maui and the GRES stuff is something like a workaround, but in EGI many sites are still based on Torque/Maui so they ask for this GPU support for this system. I saw that in a issue here you gave a detailed answer on how in Torque/pbs_sched source code the NVML COMPUTE mode is set, so i hoped you could give me an hint on how to obtain the same behaviour for Torque/Maui. Thanks for your answer.

@tatarsky
Copy link
Contributor

tatarsky commented Oct 1, 2015

Yeah, while I feel your pain, I do not know how to do that in Torque/Maui.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants