Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New jobs are not being dispatched if there are 100s of other 'new' jobs subject to concurrency limits we have set within TPV #2254

Open
cat-bro opened this issue Oct 17, 2024 · 1 comment

Comments

@cat-bro
Copy link
Collaborator

cat-bro commented Oct 17, 2024

A user has 250 new alphafold jobs (limited in TPV to 2 per person) and jobs she has been submitting after this have not been dispatching.

I have put a hack fix in for now by increasing the ready_window_size in the job conf to 120 which works on this occasion because no handler has more than 60 of these alphafold jobs, but this will not work for an arbitrary number of new jobs.

TPV limiting raises a JobNotReadyException if there are too many jobs of a type for a user. The presence of these new jobs must be preventing other jobs submitted by the same user from ever getting Grabbed.

cat-bro added a commit that referenced this issue Oct 17, 2024
ready_window_size has been set to 120 for two hours and I can't see any problems arising from it in terms of handler/db CPU or memory use.

This mitigates #2254 for now but there needs to be a better solution for this issue.
@cat-bro cat-bro changed the title New jobs are not being dispatched if there are 100s of newer 'new' jobs subject to TPV concurrency limits New jobs are not being dispatched if there are 100s of newer 'new' jobs subject to concurrency limits we have set within TPV Oct 17, 2024
@cat-bro
Copy link
Collaborator Author

cat-bro commented Oct 17, 2024

This is not because of TPV, but because of the way limits are set with Galaxy Australia's TPV configuration.

@cat-bro cat-bro changed the title New jobs are not being dispatched if there are 100s of newer 'new' jobs subject to concurrency limits we have set within TPV New jobs are not being dispatched if there are 100s of other 'new' jobs subject to concurrency limits we have set within TPV Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant