-
Notifications
You must be signed in to change notification settings - Fork 304
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HPCC-33015 Improve system resilience when thor crashes #19309
Conversation
Jira Issue: https://hpccsystems.atlassian.net//browse/HPCC-33015 Jirabot Action Result: |
if (0 == currentGraphName.length()) // only ever true if !multiJobLinger | ||
|
||
// The following is true if no workunit/graph have been received | ||
// MORE: I think it should also be executed if lingerPeriod is 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have created a separate jira to fix this (when Jake gets back)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ghalliday - lingerPeriod cannot be 1 (as defined by the values schema)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok.. agree, lingerPeriod is optional (min 1 if set), if it is not set (0).. it does look like it is not exiting correctly, checking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In fact, lingerPeriod cannot be 0. It defaults to 60 seconds, and the schema will not allow it to be set 0.
So the if (lingerPeriod) condition is misleading, it will always enter this block of code in practice.
note: This has not been tested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good.
* Ensure that a thor engine that has crashed is no longer associated with a workunit * Ensure that a thor instance that never processes a workunit terminates cleanly Signed-off-by: Gavin Halliday <[email protected]>
0172f15
to
59aeb2b
Compare
Jirabot Action Result: |
Type of change:
Checklist:
Smoketest:
Testing: