-
-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making the project cluster ready #130
Comments
No, I haven’t. I think incorporating too many simulations could make the framework bulky and less clean. There are more mature frameworks, such as Flower, that are better suited for such tasks.
Actually, FL-bench only loads the partition file at the start of an experiment. If you want to run multiple experiments in parallel, you can repartition the data after the initial experiment has started. |
I'm totally in favour of keeping it as slim as it is currently (that's what I like about this project). An apptainer setup would only consist of an additional file in the .env folder that uses the prebuilt docker container and one or two commands in the readme. I would like to run all the methods in this project on a cluster that I have access to and compile a bit of a chart of performance under a few different data schemes and regimes. |
I’m not familiar with Apptainer 😂. If you believe this feature would be valuable for users, feel free to create a PR. I’d love to see it! |
Btw, when creating a PR, please ensure you select the |
Just as a short overview of the status: I played around with running the project on a big SLURM cluster. In order to really make it work, it must be possible to pre-generate all the dataset splits. On SLURM you usually put your job in a queue, so you have no control over when the job starts. Additionally the filesystem is read-only, so you can't just generate the data in the container once the run starts. I would try to add the option to generate splits with IDs, that can be optionally loaded explicitly when a run is started. |
Sounds like a big feature. I would like to remind you this just in case: please sync the recent changes of FL-bench and make sure that there is no conflicts when you decide to open a PR. |
Feature Description
I think it would be very cool to have an included slurm/apptainer setup to run the project directly on a cluster. Did you use something like this already, when you worked on the connected paper?
Incidentally, to enable running the project in parallel, it may be necessary to allow preparing multiple versions of a dataset with different partitioning schemes/parameters. Currently the code assumes that there is only one prepared version of each dataset. What problems can occur when that assumption is removed?
To Reproduce
No response
Additional context
No response
The text was updated successfully, but these errors were encountered: