New roadmap for Slurm-web 3.0 and beyond #235

rezib · 2023-05-08T02:14:34Z

rezib
May 8, 2023
Maintainer

This post describes the roadmap envisioned by Rackslab for Slurm-web, starting from the next major release 3.0 and beyond.

Near real-time updates of the dashboard

Currently, the dashboard in Slurm-web are fully refreshed on a regular basis, whether or no new data have to be represented. The idea is to rely on Server-Sent Events to get continuous information about nodes and jobs state changes in a near real-time manner and update the dashboard atomically and aynschronously, without intermediate blank screen while reloading.

Accounting reports and visualization of past jobs

Slurm-web will be able to browse past jobs recorded in SlurmDBD accounting database with all their information (exit status, submission command, etc). Basic reports will be also available to control HPC supercomputers usage over a period of time.

Built-in metrics about jobs and scheduling

The dashboard will offer metrics about the current and past stream of jobs, node states and Slurm scheduler internal metrics (eg. backfilling cycles length). System administrators will get the ability to detect and analyze usage spikes and optimize the scheduler settings to support users workflows.

Improved Gantt view

Starting with version 2.0, Slurm-web offers a Gantt view to represent running and pending jobs on the computing resources. This view is especially useful to understand why jobs are pending and the advantages of backfilling. This view will be significantly improved for more scalability, better readability and also represent past jobs.

Job submission and inspection

Users will be able to submit compute batch jobs directly from the web interface. The goal is to cover at least simple use cases and help inexperienced users to overcome barriers. After the jobs are submitted, users will be able to follow jobs statuses, trac executions and inspect the outputs eventually.

GPGPU support

Currently, Slurm-web only represents allocations of CPU resources in the web interface. In the next releases, users will be able to visualize more types are computing resources including memory (TRES) and GPGPUs (GRES).

QOS, associations and reservations management

Existing QOS and advanced reservations are currently reported in Slurm-web 2.x. It is planned to offer the possibility to manage these entities (ex: modify priority on QOS, add limit on association, create a reservation, etc) directly in the web interface, with advanced permissions management.

Frontend based on modern JS framework

Slurm-web frontend code is built on out-of-date libraries. This code base will be reworked to use modern and well established open source JS framework. It will allow getting rid of most boilerplate code, lowering the maintenance effort and eventually help speed up the development of new features.

Based on slurmrest REST API

Instead of defining its own specific REST API, Slurm-web will be based on standard slurmrest REST API.

The dependency on PySlurm library has always been a source of issues and complexity for deployment, then it will be dropped.

RacksDB based topology database

Since its initial release, Slurm-web requires the definition of an XML file to describe to topology of the supercomputers. The format of the file is specific, complicated and not designed to be mutualized with other applications.

Slurm-web 3.0 will extract cluster components and topology from RacksDB YAML files within a format designed to be pragmatic, extensible and re-usable.

RPM and deb packages for most distributions

The complexity to install Slurm-web has always been one of the major concerns of the community. Rackslab will build and distribute native deb and RPM packages for most common GNU/Linux distributions (RHEL, Ubuntu, Debian, etc).

Containers

For people who prefer containers over native packages, Rackslab will also distribute container images to deploy Slurm-web with docker or podman.

marks221b · 2023-05-19T06:57:27Z

marks221b
May 19, 2023

I think it is better to improve the operating system that supports centos7 series.

2 replies

marks221b May 19, 2023

We have been waiting for a detailed tutorial to deploy slurm-web on centos7

rezib May 20, 2023
Maintainer Author

Hi @marks221b,

We will distribute official RPM packages for el8 (including CentOS 8). I am not sure we will able to support el7 with native packages because some dependencies might be outdated (esp. python libraries). However, we will also distribute container images for easy deployment on any OS. The documentation will be updated with a quickstart guide for all supported installation methods.

ChrisMoth · 2023-06-27T12:25:09Z

ChrisMoth
Jun 27, 2023

Is this effort strictly targetting management of "entire slurm clusters?" Or, might there be a way to use it to manage sets of related jobs?

I ask because I am the developer of a slurm-based genetic variant analyzer. At the end of several hundred calculations we end up with this:

https://structbio.vanderbilt.edu/~mothcw/forMervin/

I am moving our tried and true command line interface to a simple input form (list of variants, various formats)

https://staging.meilerlab.org/vu-struct/

All is going well - but in the cases of bad data, crashed jobs, and everything else that goes along with scientific computing, the ability of an "end user" to have "command line power from web" (look into logs, restart jobs, stop jobs, etc) would be incredibly helpful... but we DO NOT want to focus their attention on their calculations (not the entire cluster).

Should I keep an eye on your work? Offer a hand? Or is this not at all your goal? Thanks!

1 reply

rezib Jun 27, 2023
Maintainer Author

Hi @ChrisMoth,

Slurm-web is and will stay a generic web UI for Slurm, it is not designed to manage specific jobs workflows with their results. Slurm-web will provide the possibility to submit jobs and track their execution in a generic manner (among many other features) but it might not not be able to suit perfectly your specific requirements.

Have you had a look to Slurm native slurmrestd API? If not, it might be a good choice to interact with Slurm from your web application, especially regarding error management.

grapearc · 2023-08-30T11:27:40Z

grapearc
Aug 30, 2023

Hi, I am looking forward to making use of slurm-web 3.0. Is there any indication yet when a first release might be available? Even a beta?
Many thanks!

1 reply

rezib Aug 30, 2023
Maintainer Author

Hi @grapearc,

We are working hard on it but unfortunately the development of Slurm-web is not funded by itself, it is essentially done in spare time between other customers deliverables required to ensure Rackslab sustainability. Consequently, we don't have enough visibility to announce a reliable date of availability.

We are looking for sponsors to fund Slurm-web development and have dedicated time and resources to work on its development. This will allow the definition of a planning with a good visibility for organizations interested in Slurm-web. If your organization would like to participate, please contact us!

michaelmyc · 2023-11-07T06:33:57Z

michaelmyc
Nov 7, 2023

Are we still on schedule for a 3.0 release by end of year?

1 reply

rezib Nov 7, 2023
Maintainer Author

Hi @michaelmyc,

Unfortunately, Slurm-web development is not yet funded, it is only done in spare time between Rackslab customers deliverables. This implies we don't have enough visibility to announce a reliable date of release.

We are still looking for sponsors to fund Slurm-web development and have dedicated time and resources to work on its development. This will allow the definition of a planning with a good visibility for organizations interested in Slurm-web. If your organization would like to participate, please contact us!

daverona · 2024-01-30T02:04:24Z

daverona
Jan 30, 2024

Hi rezlib,

Glad to hear that you will use slurmrestd instead of pyslurm.
Does this mean that slurm-web (REST part) with slurmrestd does not need

/etc/password
/etc/group
/etc/munge

?

2 replies

rezib Jan 30, 2024
Maintainer Author

Hi @daverona,

Slurm-web v3 will not require access to NSS, thus it doesn't need access to /etc/password and /etc/group. Authentication and users retrieval will be performed with LDAP, all other data will be requested from slurmrestd.

Slurm-web v3 won't use munge by itself. However, for the moment, we plan to support connection to slurmrestd through UNIX socket only. In this mode, there is not JWT authentication involved between Slurm components and slurmrestd must access munge key and munge socket.

Considering your awesome work in #222, I guess you are asking to figure out the containers requirements for Slurm-web v3, right? 😊
I think the best option in this configuration will be to deploy Slurm components outside the containers and just bind mount slurmrestd UNIX socket in the container for Slurm-web. I'm pretty confident it must be that simple.

daverona Jan 31, 2024

Hi rezib,

That's good to hear. I thought the same.
I wish the best for the project funding!
:-)

rezib · 2024-05-13T11:31:17Z

rezib
May 13, 2024
Maintainer Author

Slurm-web v3.0.0 is now available 🎉 It has been announced in #256

The roadmap is now published and maintained on new official website: https://slurm-web.com/roadmap/

I hope you're going to enjoy!

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New roadmap for Slurm-web 3.0 and beyond #235

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 7 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

Select a reply

New roadmap for Slurm-web 3.0 and beyond #235

rezib May 8, 2023 Maintainer

Near real-time updates of the dashboard

Accounting reports and visualization of past jobs

Built-in metrics about jobs and scheduling

Improved Gantt view

Job submission and inspection

GPGPU support

QOS, associations and reservations management

Frontend based on modern JS framework

Based on slurmrest REST API

RacksDB based topology database

RPM and deb packages for most distributions

Containers

Replies: 6 comments · 7 replies

marks221b May 19, 2023

marks221b May 19, 2023

rezib May 20, 2023 Maintainer Author

ChrisMoth Jun 27, 2023

rezib Jun 27, 2023 Maintainer Author

grapearc Aug 30, 2023

rezib Aug 30, 2023 Maintainer Author

michaelmyc Nov 7, 2023

rezib Nov 7, 2023 Maintainer Author

daverona Jan 30, 2024

rezib Jan 30, 2024 Maintainer Author

daverona Jan 31, 2024

rezib May 13, 2024 Maintainer Author

rezib
May 8, 2023
Maintainer

Replies: 6 comments 7 replies

marks221b
May 19, 2023

rezib May 20, 2023
Maintainer Author

ChrisMoth
Jun 27, 2023

rezib Jun 27, 2023
Maintainer Author

grapearc
Aug 30, 2023

rezib Aug 30, 2023
Maintainer Author

michaelmyc
Nov 7, 2023

rezib Nov 7, 2023
Maintainer Author

daverona
Jan 30, 2024

rezib Jan 30, 2024
Maintainer Author

rezib
May 13, 2024
Maintainer Author