-
Notifications
You must be signed in to change notification settings - Fork 376
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make event handling fallible #2995
Make event handling fallible #2995
Conversation
I believe our recommendation was always to simply loop trying to handle the event until they succeed, which is basically what we're doing here for them :) As to the code here, I think we should make more clear in the interface the event will be replayed, eg by making the error variant a unit struct called |
5a45ebe
to
34d7a9b
Compare
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2995 +/- ##
==========================================
- Coverage 89.79% 89.75% -0.05%
==========================================
Files 121 121
Lines 100826 100916 +90
Branches 100826 100916 +90
==========================================
+ Hits 90537 90576 +39
- Misses 7614 7658 +44
- Partials 2675 2682 +7 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See-also my above comments.
34d7a9b
to
772a851
Compare
Rebased and included a fixup to introduce
Could you clarify which flag you are referring to exactly? Do you mean just |
The
I think that's okay if the BP loop busy-waits? If we're blocked waiting for something to complete a busy-wait is a perfectly reasonable way to signal to the user that something is horribly wrong (maybe they'll file a bug report asking why their phone is getting hot) :) |
I'm confused: wouldn't this only work for
Hmm, I tend to disagree? That might be okay in blocking land where event handling would run on its own thread, but in |
Sure, they should all do a similar thing.
Hmm, as long as the user has an async...anything handling the event that's failing we should yield at least once during the BP loop. We could add an explicit yield (in the form of a ~1ms sleep), I guess? |
772a851
to
4debd21
Compare
Rebased to resolve conflicts after #3060 landed, have yet to adjust |
fd89e2a
to
43a4a04
Compare
Now added logic to retain failed events in the Two observations:
Coming back to this: Upon further inspection it seems we set Generally this is still missing some test coverage, but should be ready for another round of review apart from that. |
Ah, indeed, you're right. We should, however, add something similar in |
lightning/src/events/mod.rs
Outdated
/// An error type that may be returned to LDK in order to safely abort event handling if it can't | ||
/// currently succeed (e.g., due to a persistence failure). | ||
/// | ||
/// LDK will ensure the event is persisted and will eventually be replayed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't be so cut-and-dry here cause its not really true - we may well (persist and) replay the event, but depending on the event type we may not. I'm not really sure how to accurately communicate this to users, though, at least short of documenting it for each individual event type (#2097).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yeah, this comment predates the discussion we had offline. I think I'll see to pick up #2097 as part of this PR also.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now took a stab at addressing #2097. Let me know if you think we should add more fine-grained context to some variants.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! It looks great, though we also really need to add some additional context around events that are "robust" vs not - eg you could open a channel, have it closed and restart without ever persisting, implying a "lost" DiscardFunding
. Doesn't have to happen in this PR, though is an implied part of #2097.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused by the DiscardFunding
example, as it seems to me it would indeed be persisted across restart if it was ever generated? At least all codepaths I see that lead to finish_close_channel
seem to set DoPersist
or notify_on_drop
?
Could it be that you're referring to the issue described in #2508, which however means DiscardFunding
wouldn't get generated in the first place?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm referring to what happens if the ChannelManager
(or any other thing) is not persisted. #2508 does come up here, but this generally applies to all events.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, may be worth opening another issue to that end?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, I don't think it needs to be addressed here, we should just not close #2097 until we address it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7dc55e5
to
188edca
Compare
Agreed. Now added a fixup that has |
188edca
to
985056c
Compare
Added a simple BackgroundProcessor test to check events are being replayed and also added a commit documenting the failure-to-handle/persistence behavior of all event variants (i.e., addressed #2097). I now also dropped the serialization logic for |
d20af7c
to
6e83263
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Just a few comments.
291f42e
to
456bcf5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Feel free to squash, IMO.
456bcf5
to
1c2478d
Compare
Squashed without further changes. |
The |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nits and one real comment, but only worth fixing cause you have to rewrite some commits anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
Make event handling fallible
commit itself doesn't build, so check_commits is failing.
Squashed the MultiResultFuturePoller
commit and added further fixups addressing the nits.
1c2478d
to
cbcd88f
Compare
This is a minor refactor that will allow us to access the individual event queue Mutexes separately, allowing us to drop the locks earlier when processing them individually.
cbcd88f
to
258853a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Previously, we would require our users to handle all events successfully inline or panic will trying to do so. If they would exit the `EventHandler` any other way we'd forget about the event and wouldn't replay them after restart. Here, we implement fallible event handling, allowing the user to return `Err(())` which signals to our event providers they should abort event processing and replay any unhandled events later (i.e., in the next invocation).
Previously, we would just fire-and-forget in `OnionMessenger`'s event handling. Since we now introduced the possibility of event handling failures, we here adapt the event handling logic to retain any events which we failed to handle to have them replayed upon the next invocation of `process_pending_events`/`process_pending_events_async`.
258853a
to
e617a39
Compare
Squashed fixups without further changes: > git diff-tree -U2 258853aed e617a394e
> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to land this, but probably needs some small followups.
Closes #2490,
Closes #2097.
Previously, we would require our users to handle all events successfully inline or panic will trying to do so. If they would exit the
EventHandler
any other way we'd forget about the event and wouldn't replay them after restart.Here, we implement fallible event handling, allowing the user to return
Err(())
which signals to our event providers they should abort event processing and replay any unhandled events later (i.e., in the next invocation).TODO:
Err(())
.