Collection of utilities / helper scripts to make life easier on our HPC clusters.
Tools to get reports on authentication & authorization settings and identities.
- caccounts: Lists details of Slurm cluster accounts.
- cfinger: Lists details of a specific user's system/login account.
- colleagues: Lists group owners, data managers and other regular group members.
- ctop: Top-like overview of cluster status and resource usage.
- cps: List all processes on a machine in a tree-like hierarchy.
- sjeff: Lists Slurm Job EFFiciency for jobs.
- cnodes: Lists state of compute nodes.
- cqos: Lists details for all Quality of Service levels.
- cqueue: Lists running and queued jobs.
- cprio: Lists priority of all jobs in the queue.
- cshare: Lists the shares Slurm users in Slurm accounts have, their recent resource usage and how this impacts their fair share.
- hpc-environment-slurm-report: Creates reports on cluster usage as percentage of available resources for specified period (e.g. week, month, etc.).
- quota: Lists quota for all shared file systems.
- hpc-environment-quota-report-for-PFS: Creates quota report for admins.
Wrapper for Slurm's sacctmgr command with custom output format to list which users are associated to which slurm accounts on which clusters. Example output:
Cluster Account User Share Def QOS QOS
---------- -------------------- ------------ --------- --------- -----------------------------------
calculon root 1 priority [list of QoS account had access to]
calculon root root 1 priority [list of QoS account had access to]
calculon users 1 regular [list of QoS account had access to]
calculon users [1st_user] 1 regular [list of QoS account had access to]
calculon users [2nd_user] 1 regular [list of QoS account had access to]
etc.
cfinger is finger on steroids: basic account details which you would also get from standard finger supplemented with public keys associated to accounts and group memberships.
Example output:
===========================================================
Basic account details for [account]:
-----------------------------------------------------------
User : [account]
UID : [0-9]+
Home : /home/[account]
Shell : /bin/bash
Mail : Real Name <[email protected]>
-----------------------------------------------------------
User [account] is authorized for access to groups:
-----------------------------------------------------------
Primary group : [account] ([0-9]+)
Secondary group : [group] ([0-9]+)
-----------------------------------------------------------
Public key(s) for authenticating user [account]:
-----------------------------------------------------------
ssh-rsa AAAAB3NzaC1yc....QR+zbmsAX0Mpw== [account]
===========================================================
Lists all users of all groups a user is a member of. Optionally you can specify:
-g [group_name]
to list only members of the specified group.-g all
to list members of all groups.-e
to sort group members by expiration date of their account. User accounts are expanded to Real Names and email addresses.
Example output:
==============================================================================================================
Colleagues in the [group] group:
==============================================================================================================
[group] owner(s):
--------------------------------------------------------------------------------------------------------------
[account] YYYY-MM-DD Real Name <[email protected]>
==============================================================================================================
[group] datamanager(s):
--------------------------------------------------------------------------------------------------------------
[account] YYYY-MM-DD Real Name <[email protected]>
==============================================================================================================
[group] member(s):
--------------------------------------------------------------------------------------------------------------
[account] YYYY-MM-DD Real Name <[email protected]>
==============================================================================================================
Cluster-top or ctop for short provides an integrated real-time overview of resource availability and usage. The example output below is in black and white, but you'll get a colorful picture if your terminal supports ANSI colors. Press the ? key to get online help, which will explain how to filter, sort and color results as well as how to search for specific users, jobs, nodes, quality of service (QoS) levels, etc.
Usage Totals: 288/504 Cores | 7/11 Nodes | 69/7124 Jobs Running 2016-12-01-T12:22:56
Node States: 3 IDLE | 1 IDLE+DRAIN | 7 MIXED
cluster node 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 load
-------------------------------------------------------------------------------------------------------
calculon calculon . . . . . . . . . . . . @ @ @ @ @ @ @ @ @ @ @ @ 2.08 = Ok
-------------------------------------------------------------------------------------------------------
calculon umcg-node011 E i T d Z O U d O w w w w S S S S e e e e . . . . . . . . . . . . . . . . . . . . . . . . . @ @ 10.96 = too low!
-------------------------------------------------------------------------------------------------------
calculon umcg-node012 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0.57 = Ok
-------------------------------------------------------------------------------------------------------
calculon umcg-node013 < N N N N C C C C = = = = x x x x b b b b D D D D A A A A I I I I T T T T Y Y Y Y . . . . . @ @ 17.31 = too low!
-------------------------------------------------------------------------------------------------------
calculon umcg-node014 B B B B B B B B B B B B B B B B B B B B B B B B B L c k k k k - - - - E E E E S S S S . . . @ @ 30.40 = too low!
-------------------------------------------------------------------------------------------------------
calculon umcg-node015 u I I I I I I I I I I I I I I I I I I I I I I I I I e v D o B B B B N N N N i i i i K K K K @ @ 34.71 = too low!
-------------------------------------------------------------------------------------------------------
calculon umcg-node016 H Z Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y Y U + _ > > > > a a a a n n n n k k k k @ @ 34.39 = too low!
-------------------------------------------------------------------------------------------------------
calculon umcg-node017 L L L L L L L L L L L L L L L L L L L L L L L L L p p p p H H H H b b b b | | | | / / / / . @ @ 22.02 = too low!
-------------------------------------------------------------------------------------------------------
calculon umcg-node018 s z ~ V y c c c c c c c c c c c c c c c c c c c c c c c c c V q l a C C C C K K K K A A A A @ @ 38.03 = too low!
-------------------------------------------------------------------------------------------------------
calculon umcg-node019 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . @ @ 0.51 = Ok
-------------------------------------------------------------------------------------------------------
calculon umcg-node020 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . @ @ 0.59 = Ok
-------------------------------------------------------------------------------------------------------
legend: ? unknown | @ busy | X down | . idle | 0 offline | ! other
JobID Username QoS Jobname S CPU(%) CPU(%) Mem(GiB) Mem(GiB) Walltime Walltime
~used req. used req. used requested
u = 562757 [account] regular-long VCF_filter_convert_concat_tt.sh R 15 100 34.3 120.0 0-14:09:10 3-00:00:00
E = 579935 [account] regular-long SRA_download_fastq_round2.sh R 0 100 2.1 120.0 3-02:19:50 5-00:00:00
s = 580724 [account] regular-long RELF10_%j R 99 100 1.2 20.0 0-21:13:23 2-00:00:00
z = 580725 [account] regular-long RELF11_%j R 99 100 1.2 20.0 0-21:13:23 2-00:00:00
~ = 580726 [account] regular-long RELF12_%j R 99 100 1.2 20.0 0-21:13:23 2-00:00:00
V = 580727 [account] regular-long RELF13_%j R 99 100 1.2 20.0 0-21:13:23 2-00:00:00
y = 580728 [account] regular-long RELF14_%j R 99 100 1.2 20.0 0-21:13:23 2-00:00:00
H = 580729 [account] regular-long RELF15_%j R 99 100 1.2 20.0 0-21:13:23 2-00:00:00
Z = 580730 [account] regular-long RELF16_%j R 99 100 1.2 20.0 0-21:13:23 2-00:00:00
i = 580731 [account] regular-long RELF17_%j R 99 100 1.2 20.0 0-17:30:31 2-00:00:00
T = 580732 [account] regular-long RELF18_%j R 99 100 1.2 20.0 0-17:30:31 2-00:00:00
d = 581023 [account] regular-long move_check_downloaded_files.sh R 1 100 0.0 40.0 0-21:13:23 3-00:00:00
c = 606872 [account] regular-short kallisto33 R 2311 2500 22.3 40.0 0-01:19:23 0-05:59:00
I = 606873 [account] regular-short kallisto34 R 2291 2500 22.3 40.0 0-01:17:23 0-05:59:00
Y = 606874 [account] regular-short kallisto35 R 1719 2500 23.7 40.0 0-00:24:32 0-05:59:00
B = 606876 [account] regular-short kallisto37 R 1685 2500 22.3 40.0 0-00:23:01 0-05:59:00
L = 606877 [account] regular-short kallisto38 R 1165 2500 22.3 40.0 0-00:13:33 0-05:59:00
A wrapper for the regular ps
command to change the default output/formatting.
Lists all processes on a machine in a tree-like hierarchy and how much time, the percentage of CPU and the percentage of memory each process consumes.
Example output:
USER PID PPID ELAPSED %CPU %MEM CMD
demo-user 20112 1 03:45:10 0.0 0.0 SCREEN -S tool_test
demo-user 20113 20112 03:45:10 0.0 0.0 \_ /bin/bash
demo-user 26873 20113 14:01 0.0 0.0 \_ bash test_fastq.sh
demo-user 26893 26873 14:01 0.0 0.0 \_ /bin/bash /groups/umcg-gcc/tmp01/demo-user/tool/tool.sh --workflow fastq --config /groups/umcg-gcc/tmp01/demo-user/tool/test/output/fastq_nanopore/custom.cfg
demo-user 26910 26893 14:01 0.0 0.0 \_ /bin/bash /groups/umcg-gcc/tmp01/demo-user/tool/tool.sh --workflow fastq --config /groups/umcg-gcc/tmp01/demo-user/tool/test/output/fastq_nanopore/custom.cfg
demo-user 26911 26910 14:01 8.6 0.6 \_ /usr/bin/java --add-opens=java.base/java.lang=ALL-UNNAMED --add-opens=java.base/java.io=ALL-UNNAMED
demo-user 3248 26911 10:45 0.0 0.0 \_ /bin/bash -ue .command.run
demo-user 3305 3248 10:45 0.0 0.0 | \_ tee .command.out
demo-user 3306 3248 10:45 0.0 0.0 | \_ tee .command.err
demo-user 3307 3248 10:45 0.0 0.0 | \_ /bin/bash -ue .command.run
demo-user 3308 3307 10:45 0.0 0.0 | \_ /bin/bash /groups/umcg-gcc/tmp01/demo-user/tool/test/output/fastq_nanopore/.nxf.work/19/3e5f75e/.command.run nxf_trace
demo-user 3379 3308 10:45 0.0 0.0 | \_ /bin/bash /groups/umcg-gcc/tmp01/demo-user/tool/test/output/fastq_nanopore/.nxf.work/19/3e5f75e/.command.sh
demo-user 9750 3379 10:15 0.0 0.0 | | \_ Apptainer runtime parent
demo-user 9772 9750 10:15 1.1 0.0 | | \_ /usr/bin/python3 /usr/local/bin/sniffles --input tool_tool_fam0_HG002_sliced.cram
demo-user 9816 9772 10:12 0.0 0.0 | | \_ /usr/bin/python3 /usr/local/bin/sniffles --input tool_tool_fam0_HG002_sliced.cram
demo-user 9817 9772 10:12 0.0 0.0 | | \_ /usr/bin/python3 /usr/local/bin/sniffles --input tool_tool_fam0_HG002_sliced.cram
demo-user 9818 9772 10:12 0.3 0.0 | | \_ /usr/bin/python3 /usr/local/bin/sniffles --input tool_tool_fam0_HG002_sliced.cram
demo-user 9819 9772 10:12 0.3 0.0 | | \_ /usr/bin/python3 /usr/local/bin/sniffles --input tool_tool_fam0_HG002_sliced.cram
demo-user 3382 3308 10:45 0.8 0.0 | \_ /bin/bash /groups/umcg-gcc/tmp01/demo-user/tool/test/output/fastq_nanopore/.nxf.work/19/3e5f75e/.command.run nxf_trace
demo-user 3696 26911 00:41 0.0 0.0 \_ /bin/bash -ue .command.run
demo-user 3719 3696 00:41 0.0 0.0 \_ tee .command.out
demo-user 3720 3696 00:41 0.0 0.0 \_ tee .command.err
demo-user 3721 3696 00:41 0.0 0.0 \_ /bin/bash -ue .command.run
demo-user 3722 3721 00:41 0.0 0.0 \_ /bin/bash /groups/umcg-gcc/tmp01/demo-user/tool/test/output/fastq_nanopore/.nxf.work/51/c0ea14b/.command.run nxf_trace
demo-user 3740 3722 00:41 0.0 0.0 \_ /bin/bash /groups/umcg-gcc/tmp01/demo-user/tool/test/output/fastq_nanopore/.nxf.work/51/c0ea14b/.command.sh
demo-user 6022 3740 00:32 0.2 0.0 | \_ Apptainer runtime parent
demo-user 6040 6022 00:32 0.2 0.0 | \_ bash /opt/bin/run_clair3.sh --bam_fn=tool_tool_fam0_HG002.cram.bam --ref_fn=/groups/umcg-gcc/rsc01/tool/resources/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
demo-user 6098 6040 00:32 0.0 0.0 | \_ bash /opt/bin/run_clair3.sh --bam_fn=tool_tool_fam0_HG002.cram.bam --ref_fn=/groups/umcg-gcc/rsc01/tool/resources/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
demo-user 6100 6098 00:32 0.0 0.0 | | \_ bash /opt/bin/run_clair3.sh --bam_fn=tool_tool_fam0_HG002.cram.bam --ref_fn=/groups/umcg-gcc/rsc01/tool/resources/GRCh38/GCA_000001405.15_GRCh38_no_alt_analysis_set.fna
demo-user 6106 6100 00:32 0.0 0.0 | | \_ /bin/bash /opt/bin/scripts/clair3_c_impl.sh --bam_fn /groups/umcg-gcc/tmp01/demo-user/tool/test/output/fastq_nanopore/.nxf.work/51/c0ea14b/tool_tool_fam0_HG002.cram
demo-user 7009 6106 00:30 0.7 0.0 | | \_ perl /opt/conda/envs/clair3/bin/parallel --retries 4 -C --joblog /groups/umcg-gcc/tmp01/demo-user/tool/test/output/fastq_nanopore/.nxf.work/51/c0ea14b/log/
demo-user 15204 7009 00:07 87.4 0.1 | | | \_ python3 /opt/bin/scripts/../clair3.py CallVariantsFromCffi --chkpnt_fn /opt/models/r941_prom_sup_g5014/pileup
demo-user 16536 7009 00:03 69.3 0.0 | | | \_ python3 /opt/bin/scripts/../clair3.py CallVariantsFromCffi --chkpnt_fn /opt/models/r941_prom_sup_g5014/pileup
demo-user 16814 7009 00:01 79.0 0.0 | | | \_ python3 /opt/bin/scripts/../clair3.py CallVariantsFromCffi --chkpnt_fn /opt/models/r941_prom_sup_g5014/pileup
demo-user 7015 6106 00:30 0.0 0.0 | | \_ tee /groups/umcg-gcc/tmp01/demo-user/tool/test/output/fastq_nanopore/.nxf.work/51/c0ea14b/log/1_call_var_bam_pileup.log
demo-user 6099 6040 00:32 0.0 0.0 | \_ tee /groups/umcg-gcc/tmp01/demo-user/tool/test/output/fastq_nanopore/.nxf.work/51/c0ea14b/run_clair3.log
demo-user 3742 3722 00:41 1.7 0.0 \_ /bin/bash /groups/umcg-gcc/tmp01/demo-user/tool/test/output/fastq_nanopore/.nxf.work/51/c0ea14b/.command.run nxf_trace
Slurm Job EFFiciency or sjeff for short provides an overview of finished jobs, the resources they requested and how efficient these were used.
The job efficiency is a percentage and defined as: used resources / requested resources * 100
.
Note that for CPU core usage the average is reported whereas for Memory usage the max peak usage is reported.
The example output below is in black and white, but you'll get a colorful picture if your terminal supports ANSI colors:
- green: Ok.
- yellow: you requested too much and wasted resources.
- red: you requested too little and your jobs ran inefficiently or got killed or were close to getting killed.
================================================================================================================
JobName | Time | Cores (used=average) | Memory (used=peak)
| Requested Used(%) | Requested Used(%) | Requested Used(%)
----------------------------------------------------------------------------------------------------------------
validateConvading_s01_PrepareFastQ_0 | 07:00:00 0.05 | 4 0.00 | 8Gn 0.01
validateConvading_s01_PrepareFastQ_1 | 07:00:00 0.06 | 4 0.00 | 8Gn 0.01
validateConvading_s03_FastQC_0 | 05:59:00 0.39 | 1 60.00 | 2Gn 11.81
validateConvading_s03_FastQC_1 | 05:59:00 0.40 | 1 66.28 | 2Gn 11.31
validateConvading_s04_BwaAlignAndSortSam_0 | 23:59:00 0.31 | 8 39.18 | 30Gn 29.96
validateConvading_s04_BwaAlignAndSortSam_1 | 23:59:00 0.35 | 8 45.72 | 30Gn 30.17
validateConvading_s05_MergeBam_0 | 05:59:00 0.03 | 10 0.00 | 10Gn 0.00
validateConvading_s05_MergeBam_1 | 05:59:00 0.05 | 10 0.00 | 10Gn 0.00
validateConvading_s06_BaseRecalibrator_0 | 23:59:00 0.65 | 8 179.61 | 10Gn 49.54
validateConvading_s06_BaseRecalibrator_1 | 23:59:00 0.62 | 8 222.78 | 10Gn 53.23
validateConvading_s07_MarkDuplicates_0 | 16:00:00 0.13 | 5 193.42 | 30Gn 2.81
validateConvading_s07_MarkDuplicates_1 | 16:00:00 0.15 | 5 167.04 | 30Gn 2.72
validateConvading_s08_Flagstat_0 | 03:00:00 0.07 | 5 0.00 | 30Gn 0.00
validateConvading_s08_Flagstat_1 | 03:00:00 0.06 | 5 0.00 | 30Gn 0.00
validateConvading_s09a_Manta_0 | 16:00:00 0.01 | 21 0.00 | 30Gn 0.00
validateConvading_s09a_Manta_1 | 16:00:00 0.01 | 21 0.00 | 30Gn 0.00
validateConvading_s09b_Convading_0 | 05:59:00 1.29 | 1 1.79 | 4Gn 0.67
validateConvading_s09b_Convading_1 | 05:59:00 1.50 | 1 0.31 | 4Gn 0.70
================================================================================================================
Wrapper for Slurm's sinfo command with custom output format to list all compute nodes and their state. Example output:
PARTITION AVAIL NODES STATE S:C:T CPUS MAX_CPUS_PER_NODE MEMORY TMP_DISK FEATURES GROUPS TIMELIMIT JOB_SIZE ALLOCNODES NODELIST REASON
duo-pro* up 8 mixed 8:6:1 48 46 258299 1063742 umcg,ll,tmp02,tmp04 all 7-00:01:00 1 all umcg-node[011-018] none
duo-dev up 1 mixed 8:6:1 48 46 258299 1063742 umcg,ll,tmp02,tmp04 all 7-00:01:00 1 all umcg-node019 none
duo-dev up 1 idle 8:6:1 48 46 258299 1063742 umcg,ll,tmp02,tmp04 all 7-00:01:00 1 all umcg-node020 none
duo-ds-umcg up 1 idle 2:12:1 24 12 387557 1063742 umcg,tmp02,tmp04,prm02,prm03 all 7-00:01:00 1 all calculon none
duo-ds-ll up 1 idle 2:1:1 2 2 7872 0 ll,tmp04,prm02,prm03 all 7-00:01:00 1 all lifelines none
Wrapper for Slurm's sacctmgr command with custom output format to list all Quality of Service (QoS) levels and their limits. Example output:
QOSLevelName Priority UsageFactor MaxResources MaxSubmit MaxResources MaxSubmit MaxWalltime CanPreemptJobsInQOSlevel
PerQOSLevel PerQOSLevel PerUser PerUser PerJob
normal 0 1.000000
leftover 0 0.000000 cpu=0,gres/gpu=0,mem=0 30000 10000
leftover-short 0 0.000000 30000 10000 06:00:00
leftover-medium 0 0.000000 30000 10000 1-00:00:00
leftover-long 0 0.000000 3000 1000 7-00:00:00
regular 10 1.000000 cpu=0,gres/gpu=0,mem=0 30000 5000
regular-short 10 1.000000 30000 5000 06:00:00 leftover-long,leftover-medium,leftover-short
regular-medium 10 1.000000 cpu=139,gres/gpu=13,mem=535660M 30000 cpu=104,gres/gpu=10,mem=401745M 5000 1-00:00:00 leftover-long,leftover-medium,leftover-short
regular-long 10 1.000000 cpu=87,gres/gpu=8,mem=334788M 3000 cpu=69,gres/gpu=7,mem=267830M 1000 7-00:00:00 leftover-long,leftover-medium,leftover-short
priority 20 2.000000 cpu=0,gres/gpu=0,mem=0 5000 1000
priority-short 20 2.000000 5000 cpu=52,gres/gpu=5,mem=200872M 1000 06:00:00 leftover-long,leftover-medium,leftover-short
priority-medium 20 2.000000 cpu=104,gres/gpu=10,mem=401745M 2500 cpu=34,gres/gpu=4,mem=133915M 500 1-00:00:00 leftover-long,leftover-medium,leftover-short
priority-long 20 2.000000 cpu=52,gres/gpu=5,mem=200872M 250 cpu=34,gres/gpu=4,mem=133915M 50 7-00:00:00 leftover-long,leftover-medium,leftover-short
interactive 30 1.000000 cpu=0,gres/gpu=0,mem=0 1
interactive-short 30 1.000000 cpu=15,gres/gpu=4,mem=46869M 1 06:00:00 leftover-long,leftover-medium,leftover-short,regular-short
ds 10 1.000000 cpu=0,gres/gpu=0,mem=0 5000 1000
ds-short 10 1.000000 5000 cpu=4,gres/gpu=0,mem=4G 1000 06:00:00
ds-medium 10 1.000000 cpu=2,gres/gpu=0,mem=2G 2500 cpu=2,gres/gpu=0,mem=2G 500 1-00:00:00
ds-long 10 1.000000 cpu=1,gres/gpu=0,mem=1G 250 cpu=1,gres/gpu=0,mem=1G 50 7-00:00:00
Wrapper for Slurm's squeue command with custom output format to list running and scheduled jobs. Example output:
JOBID PARTITION QOS NAME USER ST TIME NODELIST(REASON) START_TIME PRIORITY
4864542 duo-pro regular-medium run_GS1_FinalReport_small_chr3 [user] PD 0:00 (Priority) 2018-06-30T02:16:05 0.00011811894367
4864541 duo-pro regular-medium run_GS1_FinalReport_small_chr2 [user] PD 0:00 (Priority) 2018-06-30T01:47:42 0.00011811894367
4864540 duo-pro regular-medium run_GS1_FinalReport_small_chr22 [user] PD 0:00 (Priority) 2018-06-30T01:34:43 0.00011811894367
4864539 duo-pro regular-medium run_GS1_FinalReport_small_chr21 [user] PD 0:00 (Resources) 2018-06-29T21:55:46 0.00011811894367
4864537 duo-pro regular-medium run_GS1_FinalReport_small_chr1 [user] R 10:09 umcg-node015 2018-06-29T16:03:44 0.00011821347292
4864526 duo-pro regular-medium run_GS1_FinalReport_small_chr0 [user] R 10:13 umcg-node013 2018-06-29T16:03:40 0.00011821347292
4864527 duo-pro regular-medium run_GS1_FinalReport_small_chr10 [user] R 10:13 umcg-node011 2018-06-29T16:03:40 0.00011821347292
4864528 duo-pro regular-medium run_GS1_FinalReport_small_chr11 [user] R 10:13 umcg-node017 2018-06-29T16:03:40 0.00011821347292
Wrapper for Slurm's sprio command with custom output format to list components that make up the priority of scheduled jobs. Example output:
##### #### ABSOLUTE NORMALIZED ABS NORM ABSOLUTE NORMALIZED ABS NORM ####
JOBID USER PRIORITY PRIORITY AGE AGE FAIRSHARE FAIRSHARE QOS QOS NICE
417928 userA 543307 0.00012649867973 1000 1.0000000 42308 0.4230769 500000 0.5000000 0
421373 userB 503197 0.00011715986820 1000 1.0000000 2198 0.0219780 500000 0.5000000 0
421374 userB 503197 0.00011715986820 1000 1.0000000 2198 0.0219780 500000 0.5000000 0
421375 userB 503197 0.00011715986820 1000 1.0000000 2198 0.0219780 500000 0.5000000 0
499136 userC 506621 0.00011795710702 29 0.0285103 6593 0.0659341 500000 0.5000000 0
499137 userC 506621 0.00011795710567 29 0.0285045 6593 0.0659341 500000 0.5000000 0
499138 userC 506621 0.00011795710413 28 0.0284979 6593 0.0659341 500000 0.5000000 0
499139 userC 506621 0.00011795710278 28 0.0284921 6593 0.0659341 500000 0.5000000 0
Wrapper for Slurm's sshare command with custom output format to list the raw share(s) assigned to users, their recent usage and their resulting fair share. Example output:
SlurmAccount SlurmUser RawShares NormalizedShares RawUsage NormalizedUsage EffectiveUsage FairShare LevelFS TRESRunMins
============ ========= ========= ================ ======== =============== ============== ========= ======= ===========
root 0.000000 113872206 0.000000 cpu=137703,mem=3204775596,energy=0,node=31091,billing=519855,fs/disk=0,vmem=0,pages=0
root root 1 0.500000 0 0.000000 0.000000 1.000000 inf cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0
users 1 0.500000 113872206 1.000000 1.000000 0.500000 cpu=137703,mem=3204775596,energy=0,node=31091,billing=519855,fs/disk=0,vmem=0,pages=0
group1 parent 0.500000 53110578 0.466405 0.466405 cpu=19363,mem=820313361,energy=0,node=13129,billing=118725,fs/disk=0,vmem=0,pages=0
group1 userA 1 0.005587 0 0.000000 0.000000 0.417582 69872977.459919 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0
group1 userB 1 0.005587 938998 0.008246 0.008246 0.115385 0.677485 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0
group1 userC 1 0.005587 345425 0.003033 0.003033 0.142857 1.841664 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0
group2 parent 0.500000 6921258 0.060781 0.060781 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0
group2 userD 1 0.005587 0 0.000000 0.000000 0.994505 inf cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0
group2 userE 1 0.005587 6921258 0.060781 0.060781 0.032967 0.091914 cpu=0,mem=0,energy=0,node=0,billing=0,fs/disk=0,vmem=0,pages=0
Custom SLURM cluster reporting tool. Lists available resources and resource usage over a specified period of time. Example output:
======================================================================
Cluster usage report from 2016-09-01T00:00:00 to 2016-10-01T00:00:00.
----------------------------------------------------------------------
Available resources for calculon cluster:
Partition CPUs (cores) Memory (GB)
----------------------------------------------------------------------
duo-pro 368 1840
duo-dev 92 460
duo-ds 12 128
TOTAL 472 2428
----------------------------------------------------------------------
Cluster Account Login CPU used MEM used
----------------------------------------------------------------------
calculon users [account] 27.61% 25.90%
calculon users [account] 17.31% 11.27%
calculon users [account] 6.44% 3.54%
calculon users [account] 5.71% 5.60%
calculon users [account] 3.53% 1.06%
TOTAL ANY ALL 60.60% 47.40%
======================================================================
Custom quota reporting tool for users. Lists quota for all groups a user is a member of. Example output:
====================================================================================================================================================
| Total size of files and folders | Total number of files and folders |
(T) Path/Filesystem | used quota limit grace | used quota limit grace | Status
----------------------------------------------------------------------------------------------------------------------------------------------------
(P) /home/[account] | 748.5 M 1.0 G 2.0 G none | 27.0 k 0.0 0.0 none | Ok
----------------------------------------------------------------------------------------------------------------------------------------------------
(G) /apps | 738.3 G 1.0 T 2.0 T none | 1,137.0 k 0.0 0.0 none | Ok
(G) /.envsync/tmp04 | 691.8 G 5.0 T 8.0 T none | 584.0 k 0.0 0.0 none | Ok
----------------------------------------------------------------------------------------------------------------------------------------------------
(G) /groups/[group]/prm02 | 11.8 T 12.0 T 15.0 T none | 26.0 k 0.0 0.0 none | Ok
(G) /groups/[group]/tmp04 | 52.4 T 50.0 T 55.0 T 5d12h17m16s | 4,101.0 k 0.0 0.0 none | EXCEEDED!
(F) /groups/[group]/tmp02 | 25.0 T 40.0 T 40.0 T none | 169.0 k 0.0 0.0 none | Ok
====================================================================================================================================================
Custom quota reporting tool for admins. Lists quota for all groups on a Physical File System (PFS). Output is similar to that from the quota tool listed above.
We use a standard GitHub workflow except that we use only one branch "master" as this is a relatively small repo and we don't need the additional overhead from branches.
⎛¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯⎞ ⎛¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯⎞
⎜ Shared repo a.k.a. "blessed" ⎜ <<< 7: Merge <<< pull request <<< 6: Send <<< ⎜ Your personal online fork a.k.a. "origin" ⎜
⎜ github.com/molgenis/cluster-utils.git⎜ >>> 1: Fork blessed repo >>>>>>>>>>>>>>>>>>>> ⎜ github.com/<your_github_account>/cluster-utils.git ⎜
⎝______________________________________⎠ ⎝____________________________________________________⎠
v v ʌ
v 2: Clone origin to local disk 5: Push commits to origin
v v ʌ
v ⎛¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯⎞
`>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> 3: pull from blessed >>> ⎜ Your personal local clone ⎜
⎜ ~/git/cluster-utils ⎜
⎝____________________________________________________⎠
v ʌ
`>>> 4: Commit changes to local clone >>>´
- Fork this repo on GitHub (Once).
- Clone to your local computer and setup remotes (Once).
#
# Clone repo
#
git clone https://github.com/your_github_account/cluster-utils.git
#
# Add blessed remote (the source of the source) and prevent direct push.
#
cd cluster-utils
git remote add blessed https://github.com/molgenis/cluster-utils.git
git remote set-url --push blessed push.disabled
- Pull from "blessed" (Regularly from 3 onwards).
#
# Pull from blessed master.
#
cd cluster-utils
git pull blessed master
Make changes: edit, add, delete...
- Commit changes to local clone.
#
# Commit changes.
#
git status
git add some/changed/files
git commit -m 'Describe your changes in a commit message.'
- Push commits to "origin".
#
# Push commits.
#
git push origin master
-
Go to your fork on GitHub and create a pull request.
-
Have one of the other team members review and eventually merge your pull request.
-
Back to 3 to pull from "blessed" to get your local clone in sync.
#
# Pull from blessed master.
#
cd cluster-utils
git pull blessed master
etc.