Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(server): prevent allocating to unregistered executors #1133

Conversation

seriousben
Copy link
Member

@seriousben seriousben commented Dec 20, 2024

Context

Timing issues can cause tasks to be allocated to an unregistered executor.

What

Check in transaction for the existence of the executor before allocation.

Testing

Contribution Checklist

  • If the python-sdk was changed, please run make fmt in python-sdk/.
  • If the server was changed, please run make fmt in server/.
  • Make sure all PR Checks are passing.

@diptanu
Copy link
Collaborator

diptanu commented Dec 21, 2024

@seriousben If we place a task on an executor which is removed, assuming the server processes the executor removed event, the task is going to be put back on the unallocated tasks queue right?

With this change, the scheduler state change request is not going to have an affect, and the event will be marked as processed. Unless there is another event which triggers the scheduler, this task is going to remain in the unplaced tasks.

@seriousben
Copy link
Member Author

@seriousben If we place a task on an executor which is removed, assuming the server processes the executor removed event, the task is going to be put back on the unallocated tasks queue right?

With this change, the scheduler state change request is not going to have an affect, and the event will be marked as processed. Unless there is another event which triggers the scheduler, this task is going to remain in the unplaced tasks.

Yeah, I think we might need a state change for scheduling errors in order to look at unplaced tasks.

I also want to split task creation from scheduling so that they can each react to different state changes and have their own error handling.

@seriousben
Copy link
Member Author

Closing in favor of a refactor getting rid of this and other timing issue edge cases caused by concurrent data updates.

@seriousben seriousben closed this Dec 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants