-
Notifications
You must be signed in to change notification settings - Fork 241
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YUNIKORN-2926] placeholder counters incorrect #986
Conversation
Placeholder tracking data is maintained inside the application for scheduling. If the placeholder is released we update the counters in the tracking data. We have cases in which we do not do that correctly: * placeholders are smaller than the real allocation * placeholder does not have an allocation to replace * all allocations are removed from an application Tests updated to check all the counters inside the placeholder data for consistency.
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## master #986 +/- ##
==========================================
- Coverage 81.50% 81.25% -0.25%
==========================================
Files 97 97
Lines 12625 15467 +2842
==========================================
+ Hits 10290 12568 +2278
- Misses 2052 2617 +565
+ Partials 283 282 -1 ☔ View full report in Codecov by Sentry. |
Is the e2e test failure expected? |
remove inconsistent tracking update on timeout extended testing and checks in placeholder timeout test cases
The failure was not expected. It was another point of inconsistent handling of the tracking data not covered by the unit tests. |
This second e2e test is failing due to the fact that the list of requests has a mixture of allocated and unallocated entries and the merging of the Ask and Allocation objects. The fix for the timeout tracking does not filter the requests lists for already allocated requests and needs to do that. We seem to be adding that same filter everywhere now to compensate for not removing the allocation from the request list.
The other problem detected on log analysis is that the shim tries to clean up the same placeholders multiple times. First based on the core request then based on internal logic as part of the placeholder code. All cleanup in the placeholder code fails as the core has already done it. It does trigger more release messages to be sent to the core which then get ignored as the work is already done. That needs a cleanup in a follow up jira. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Placeholder tracking data is maintained inside the application for scheduling. If the placeholder is released we update the counters in the tracking data. We have cases in which we do not do that correctly: * placeholders are smaller than the real allocation * placeholder does not have an allocation to replace * all allocations are removed from an application Closes: #986 Signed-off-by: Craig Condit <[email protected]>
What is this PR for?
Placeholder tracking data is maintained inside the application for scheduling. If the placeholder is released we update the counters in the tracking data. We have cases in which we do not do that correctly:
Tests updated to check all the counters inside the placeholder data for consistency.
What type of PR is it?
What is the Jira issue?
How should this be tested?
New unit tests added
e2e test with incorrectly sized placeholders is needed