Clarifying where to preprocess #81

rkube · 2022-04-25T14:35:11Z

Preprocessing results on too much compute load for the traverse head node.

buildbot-princeton · 2022-04-25T14:35:12Z

Can one of the admins verify this patch?

rkube · 2022-04-25T18:07:40Z

To preprocess the dataset on traverse I need to limit the number of threads used for preprocessing
#82

felker · 2022-04-25T18:30:57Z

There are 44 cores on a node of Traverse, right? Any reason why we can only spawn 32 threads?

felker · 2022-04-25T18:36:36Z

Also I am in favor of not changing the default conf.yaml to make it specific to Princeton-based systems. So:

fs_path: '/Users/'
...
max_cpus: -1

(/Users/ isn't an ideal default, but it is generic-enough. Maybe should be set to $HOME, would need to check the parsing logic)

rkube · 2022-04-25T18:54:46Z

Each traverse node has 2 processors, 16 cores per processor and 4 threads per core. When I run pre-processing with 126 threads it starts off well but throws errors after a while. May be running into memory limits?

felker · 2022-04-25T20:57:04Z

Ah, I had assumed that the CPU model was the same as on Summit. What do you get when you run lscpu and cat /proc/cpuinfo on a Traverse compute node (just curious)?

But this problem is likely because of the 4-way SMT, which wasnt on the Tiger cluster, which the code was originally written for.

rkube · 2022-04-25T21:02:59Z

Summit and traverse are very similar, but no 100% identical.

(frnn) [rkube@traverse examples]$ lscpu
Architecture:        ppc64le
Byte Order:          Little Endian
CPU(s):              128
On-line CPU(s) list: 0-127
Thread(s) per core:  4
Core(s) per socket:  16
Socket(s):           2
NUMA node(s):        6
Model:               2.3 (pvr 004e 1203)
Model name:          POWER9, altivec supported
CPU max MHz:         3800.0000
CPU min MHz:         2300.0000
L1d cache:           32K
L1i cache:           32K
L2 cache:            512K
L3 cache:            10240K
NUMA node0 CPU(s):   0-63
NUMA node8 CPU(s):   64-127
NUMA node252 CPU(s): 
NUMA node253 CPU(s): 
NUMA node254 CPU(s): 
NUMA node255 CPU(s): 
(frnn) [rkube@traverse examples]$ cat /proc/cpuinfo 
processor       : 0
cpu             : POWER9, altivec supported
clock           : 3683.000000MHz
revision        : 2.3 (pvr 004e 1203)

processor       : 1
cpu             : POWER9, altivec supported
clock           : 3683.000000MHz
revision        : 2.3 (pvr 004e 1203)

processor       : 2
cpu             : POWER9, altivec supported
clock           : 3683.000000MHz
revision        : 2.3 (pvr 004e 1203)
...
processor       : 127
cpu             : POWER9, altivec supported
clock           : 3533.000000MHz
revision        : 2.3 (pvr 004e 1203)

timebase        : 512000000
platform        : PowerNV
model           : 8335-GTH
machine         : PowerNV 8335-GTH
firmware        : OPAL
MMU             : Radix

rkube closed this Sep 13, 2022

rkube force-pushed the tf2 branch from 88dc979 to 1ba2f7c Compare September 13, 2022 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarifying where to preprocess #81

Clarifying where to preprocess #81

rkube commented Apr 25, 2022

buildbot-princeton commented Apr 25, 2022

rkube commented Apr 25, 2022

felker commented Apr 25, 2022

felker commented Apr 25, 2022

rkube commented Apr 25, 2022

felker commented Apr 25, 2022

rkube commented Apr 25, 2022 •

edited

Loading

Clarifying where to preprocess #81

Clarifying where to preprocess #81

Conversation

rkube commented Apr 25, 2022

buildbot-princeton commented Apr 25, 2022

rkube commented Apr 25, 2022

felker commented Apr 25, 2022

felker commented Apr 25, 2022

rkube commented Apr 25, 2022

felker commented Apr 25, 2022

rkube commented Apr 25, 2022 • edited Loading

rkube commented Apr 25, 2022 •

edited

Loading