-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wandb-osh cannot handle many runs at once #83
Comments
Thanks a lot for the report and the PR ❤️ . I'm currently reading through your PR. Let's keep this issue open until we merge it! |
#101 could be a simpler fix to this issue and I expect this to be merged very soon |
@klieret Sorry I have not had a chance yet to go back to this and implement your suggestions. Multiprocessing lets it attend to multiple jobs at once. |
Hi @RitwikGupta, thanks for your reply. It would be a tradeoff between update frequencies and number of runs. A single The current setup is that it tries to trigger a sync for every epoch, and that might have some runs outcrowding others. So #101 allows to bring that rate down to something where the But I agree that it's still nice to have the multiple-job setup. Unfortunately I also currently don't have much time to put into this. But I'm happy to review again, if you address the comments in #85 :) |
@klieret I think the difference between our wandb logging setups is that we are logging every iteration, of which we have 10-20k per epoch. Therefore, our sync calls take about 5-10s to complete, reducing us to 6-12 runs that can sync per minute. We are often running 15+ such runs at a time. I am also currently short on time, but I will find the one hour it will take to implement your feedback soon. It's on my to do list! |
Thanks for your reply. Just for clarification: Even if you log every iteration, you do not need to trigger the sync every iteration. The frequency with which you trigger the sync will be adjustable with #101, so no matter how long the sync takes or how many iterations you log, you should still be able to synchronize all projects (just with less frequent updates). |
Hello,
We are a
wandb-osh
power user. First of all, thank you for making this excellent utility.We frequently run 20-30 runs, all logging to wandb, simultaneously. What ends up happening is that the runs which log more frequently crowd out the runs that log less frequently. Therefore, slow runs rarely update to wandb!
If
wandb-osh
could handle command files in a first-in-first-out fashion rather than "last written", this would fix the issue.Thank you again for making this. I will attempt to make a PR when I get some time, unless you get to it first.
The text was updated successfully, but these errors were encountered: