-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
adds attempt to call failure callback on internal exception; adds tes… #23
base: master
Are you sure you want to change the base?
Conversation
…ts to JobletProcessHandler
try { | ||
status = jobletStatusManager.getStatus(identifier); | ||
} catch (Exception e) { | ||
status = JobletStatus.IN_PROGRESS; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not have full context on the goals here, but I am highly skeptical of eaten exceptions -- can we LOG.error the exception, even if we aren't re-throwing it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on what the parent code does, it might even make sense to catch this exception, run the failure callback, and then re-throw the exception. In in case you want to mark the handler as "broken" (so it doesn't keep getting work to do and failing it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically agreeing with Ben here, I think the correct way to do this is to have the catch
block try to run the failure callback, rather than messing with control flow in this way. You can, inside the catch block, wrap the failure callback call and throw some explicit exception type for when it's the failure callback itself that's doing the failing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i definitely hear you on the flow control. one of the reasons i was a little reticent to do this, is that if i just 1) catch the exception, 2) attempt to run failure callback 3) re-throw same exception or a new one then
jobletStatusManager.remove(identifier);
configStorage.deleteConfig(identifier);
are not going to get called, so there's some additional clean up that's not going to happen. that being said, that's no worse than what currently happens in this case, so i think you guys are right. i'll make this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i just pushed the change. lmk if this is addresses your concerns.
was thinking about this a bit more last night and one concern that i have is that this is a change in behavior that may not be intuitive in all cases. in the case where the status file is empty and throws an exception, as in the case described above, if status is actually |
* still throws an exception on failure, but also attempt failure callback * throws explicit exception if the failure callback fails * adjusts tests to match new behavior
I hear what you're saying, but I think the only reasonable way for the code to react here is if we can't determine that something is done, we determine that it has failed. idk to what extent we are good about this now, but our callbacks should be written with these transactional semantics in mind - a request should only count as done if it's success callback executes, and it should never leave anything around that's a problem if it's failure callback executes no matter how far it has gotten. Additionally failure callbacks have to be idempotent. Transactional semantics can be unintuitive but I think they're important enough for people to try to change their intuitions on. Maybe worth doing some broader dev outreach about failures and the difficulties of handling them and thinking about transactions if we don't think people are thinking about this the right way. |
The current change actually doesn't address all of the bugs that are in this section. For example, if the success callback fails, we never try to execute the failure callback. In my head I think we restructure it like: try{
if(getStatus == DONE){
successCallback()
cleanup()
return
}else{
failureCallback()
cleanup()
return
}
}catch(Exception e){
try{
failureCallback()
cleanup()
}catch(Exception e){
throw new PotentialInconsistentStateException(e)
}
throw new DaemonException(e)
}
1 we need to write code so that this is fine, and 2 is definitely tricker, but may mean that we should also be writing our failure callbacks to not have any effect on things that have had successes execute for "true" correctness. Old FPS actually used to do this, though it resulted in as much pain as breakage prevention I think. |
got it. additions you're proposing:
i don't think i'm super opinionated on 1 & 2. i can see arguments for both sides. failures in callbacks are already a pain point in daemon and either solution has tradeoffs. if we feel the contract is that a request has not succeeded until all of the user provided code (including its callbacks) have succeeded, then what you suggest seems reasonable. it definitely is a bit unintuitive to have a callbacks double trigger though. i don't think i agree with you on 3--though i may be misunderstanding your position. i think of |
For 3, I was assuming that failure to clean that stuff up would leave the possibility that that the daemon finds the request again in the future and runs callbacks on it again. I think this is true for ForkedDaemonExecutor, but I could be wrong. If I'm right, then the request is not properly completed until you can guarantee that it won't later fail, and thus failure of cleanup is a failure. If I'm wrong then the cleanup should not be included. |
cool. i'm going to dig in and figure out if that's true. |
i think technically speaking not cleaning up these resources given the current implementation does not risk a request getting picked up again. though given how much digging i had to do, i may have persuaded myself the your approach is more conservative (and perhaps more wise). here's what if found. in
so it's happenstance that the implementations of the other thing giving me pause is that empirically there have been cases where an old serialized config has broken daemon entirely (here's one of your prs to make this less painful: #14). the model i've described doesn't account for something like that happening since i think that old configs are nuked when new ones come in. so either there's a hole in my investigation or the deserialization problem doesn't relate to this PR in the way i think it did. i can't find any stack traces of the old problem to verify one way or another. i can attempt to induce it locally to assuage my concern but wanted to check in with you first. |
3712fd0
to
2409d2b
Compare
in the case where the call to
jobletStatusManager.getStatus(identifier)
fails, the process dies without attempting to call failure callbacks. i don't think this behavior is very intuitive, because it does not allow the caller to realize that there's a problem. i'd prefer that in such a case daemon tries to call my failure callback. what do you guys think?