New roadmap for Slurm-web 3.0 and beyond #235
Replies: 6 comments 7 replies
-
I think it is better to improve the operating system that supports centos7 series. |
Beta Was this translation helpful? Give feedback.
-
Is this effort strictly targetting management of "entire slurm clusters?" Or, might there be a way to use it to manage sets of related jobs? I ask because I am the developer of a slurm-based genetic variant analyzer. At the end of several hundred calculations we end up with this: https://structbio.vanderbilt.edu/~mothcw/forMervin/ I am moving our tried and true command line interface to a simple input form (list of variants, various formats) https://staging.meilerlab.org/vu-struct/ All is going well - but in the cases of bad data, crashed jobs, and everything else that goes along with scientific computing, the ability of an "end user" to have "command line power from web" (look into logs, restart jobs, stop jobs, etc) would be incredibly helpful... but we DO NOT want to focus their attention on their calculations (not the entire cluster). Should I keep an eye on your work? Offer a hand? Or is this not at all your goal? Thanks! |
Beta Was this translation helpful? Give feedback.
-
Hi, I am looking forward to making use of slurm-web 3.0. Is there any indication yet when a first release might be available? Even a beta? |
Beta Was this translation helpful? Give feedback.
-
Are we still on schedule for a 3.0 release by end of year? |
Beta Was this translation helpful? Give feedback.
-
Hi rezlib, Glad to hear that you will use slurmrestd instead of pyslurm.
? |
Beta Was this translation helpful? Give feedback.
-
Slurm-web v3.0.0 is now available 🎉 It has been announced in #256 The roadmap is now published and maintained on new official website: https://slurm-web.com/roadmap/ I hope you're going to enjoy! |
Beta Was this translation helpful? Give feedback.
-
This post describes the roadmap envisioned by Rackslab for Slurm-web, starting from the next major release 3.0 and beyond.
Near real-time updates of the dashboard
Currently, the dashboard in Slurm-web are fully refreshed on a regular basis, whether or no new data have to be represented. The idea is to rely on Server-Sent Events to get continuous information about nodes and jobs state changes in a near real-time manner and update the dashboard atomically and aynschronously, without intermediate blank screen while reloading.
Accounting reports and visualization of past jobs
Slurm-web will be able to browse past jobs recorded in SlurmDBD accounting database with all their information (exit status, submission command, etc). Basic reports will be also available to control HPC supercomputers usage over a period of time.
Built-in metrics about jobs and scheduling
The dashboard will offer metrics about the current and past stream of jobs, node states and Slurm scheduler internal metrics (eg. backfilling cycles length). System administrators will get the ability to detect and analyze usage spikes and optimize the scheduler settings to support users workflows.
Improved Gantt view
Starting with version 2.0, Slurm-web offers a Gantt view to represent running and pending jobs on the computing resources. This view is especially useful to understand why jobs are pending and the advantages of backfilling. This view will be significantly improved for more scalability, better readability and also represent past jobs.
Job submission and inspection
Users will be able to submit compute batch jobs directly from the web interface. The goal is to cover at least simple use cases and help inexperienced users to overcome barriers. After the jobs are submitted, users will be able to follow jobs statuses, trac executions and inspect the outputs eventually.
GPGPU support
Currently, Slurm-web only represents allocations of CPU resources in the web interface. In the next releases, users will be able to visualize more types are computing resources including memory (TRES) and GPGPUs (GRES).
QOS, associations and reservations management
Existing QOS and advanced reservations are currently reported in Slurm-web 2.x. It is planned to offer the possibility to manage these entities (ex: modify priority on QOS, add limit on association, create a reservation, etc) directly in the web interface, with advanced permissions management.
Frontend based on modern JS framework
Slurm-web frontend code is built on out-of-date libraries. This code base will be reworked to use modern and well established open source JS framework. It will allow getting rid of most boilerplate code, lowering the maintenance effort and eventually help speed up the development of new features.
Based on slurmrest REST API
Instead of defining its own specific REST API, Slurm-web will be based on standard slurmrest REST API.
The dependency on PySlurm library has always been a source of issues and complexity for deployment, then it will be dropped.
RacksDB based topology database
Since its initial release, Slurm-web requires the definition of an XML file to describe to topology of the supercomputers. The format of the file is specific, complicated and not designed to be mutualized with other applications.
Slurm-web 3.0 will extract cluster components and topology from RacksDB YAML files within a format designed to be pragmatic, extensible and re-usable.
RPM and deb packages for most distributions
The complexity to install Slurm-web has always been one of the major concerns of the community. Rackslab will build and distribute native deb and RPM packages for most common GNU/Linux distributions (RHEL, Ubuntu, Debian, etc).
Containers
For people who prefer containers over native packages, Rackslab will also distribute container images to deploy Slurm-web with docker or podman.
Beta Was this translation helpful? Give feedback.
All reactions