Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Making the project cluster ready #130

Open
wittenator opened this issue Nov 25, 2024 · 6 comments
Open

Making the project cluster ready #130

wittenator opened this issue Nov 25, 2024 · 6 comments
Labels
enhancement New feature or request

Comments

@wittenator
Copy link
Contributor

Feature Description

I think it would be very cool to have an included slurm/apptainer setup to run the project directly on a cluster. Did you use something like this already, when you worked on the connected paper?

Incidentally, to enable running the project in parallel, it may be necessary to allow preparing multiple versions of a dataset with different partitioning schemes/parameters. Currently the code assumes that there is only one prepared version of each dataset. What problems can occur when that assumption is removed?

To Reproduce

No response

Additional context

No response

@wittenator wittenator added the enhancement New feature or request label Nov 25, 2024
@KarhouTam
Copy link
Owner

I think it would be very cool to have an included Slurm/Apptainer setup to run the project directly on a cluster. Did you use something like this already when you worked on the connected paper?

No, I haven’t. I think incorporating too many simulations could make the framework bulky and less clean. There are more mature frameworks, such as Flower, that are better suited for such tasks.

Incidentally, to enable running the project in parallel, it may be necessary to allow preparing multiple versions of a dataset with different partitioning schemes/parameters. Currently, the code assumes that there is only one prepared version of each dataset. What problems can occur when that assumption is removed?

Actually, FL-bench only loads the partition file at the start of an experiment. If you want to run multiple experiments in parallel, you can repartition the data after the initial experiment has started.

@wittenator
Copy link
Contributor Author

wittenator commented Nov 25, 2024

I'm totally in favour of keeping it as slim as it is currently (that's what I like about this project). An apptainer setup would only consist of an additional file in the .env folder that uses the prebuilt docker container and one or two commands in the readme. I would like to run all the methods in this project on a cluster that I have access to and compile a bit of a chart of performance under a few different data schemes and regimes.

@KarhouTam
Copy link
Owner

I’m not familiar with Apptainer 😂. If you believe this feature would be valuable for users, feel free to create a PR. I’d love to see it!

@KarhouTam
Copy link
Owner

Btw, when creating a PR, please ensure you select the dev branch as the target for merging. I may perform some checks and make necessary modifications in that branch.

@wittenator
Copy link
Contributor Author

Just as a short overview of the status: I played around with running the project on a big SLURM cluster. In order to really make it work, it must be possible to pre-generate all the dataset splits. On SLURM you usually put your job in a queue, so you have no control over when the job starts. Additionally the filesystem is read-only, so you can't just generate the data in the container once the run starts. I would try to add the option to generate splits with IDs, that can be optionally loaded explicitly when a run is started.

@KarhouTam
Copy link
Owner

Sounds like a big feature. I would like to remind you this just in case: please sync the recent changes of FL-bench and make sure that there is no conflicts when you decide to open a PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants