-
-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix rare termination logic failures that could result in early shutdown #4556
Conversation
i'll try and set up a VM to reproduce locally at some point (probably not for a day or two) but not sure what the issue is since everything else is happy.. |
9c167df
to
b4f0960
Compare
i'm able to reproduce the crash.. some info:
based on the above, my current guess is there's some sort of interaction between the @ponylang/core any thoughts on how to proceed? |
Without any evidence to support my statements, this feels like a memory clobbering. And it also feels like a musl bug. |
@dipinhora can you use the 20241203 musl image i just pushed and see if there is any difference? it should have a new musl that fixed "multiple race conditions". Worth a try! NEVERMIND. LLVM isn't building. I'm working on getting us able to use that newer musl. |
@dipinhora I'll let you know when the new builder is ready to go and try with this. |
@dipinhora you can rebase against main and get the new builder using the latest musl release. |
Hi @dipinhora, The changelog - fixed label was added to this pull request; all PRs with a changelog label need to have release notes included as part of the PR. If you haven't added release notes already, please do. Release notes are added by creating a uniquely named file in the The basic format of the release notes (using markdown) should be:
Thanks. |
Prior to this commit, there was a very rare edge case in the termination logic that could result in early shutdown resulting in a segfault. This commit simplifies and reworks the shutdown/termination logic in order to make it more robust with less edge cases. The logic now: * does not un-noisy an actor from the ASIO thread until the relevant ASIO event is destroyed instead of when it is unsubscribed. This is important because the ASIO subsystem still has a reference to the actor and can send a message to it until the ASIO event is destroyed even if it has been unsubscribed * always runs the CNF/ACK protocol to all schedulers instead of only the active ones * disables scheduler scaling to ensure all schedulers are active for the duration of the termination CNF/ACK protocol to avoid / minimize complexity from schedulers suspending during the termination process * ensures the local scheduler tracking of ASIO noisiness is more accurate and robust to messages being received out of order
b4f0960
to
a71a1fa
Compare
@dipinhora am i correct that the new builder with new musl didn't address? |
no, but it gave more info in the backtrace.. it's a race condition on program startup between i'm working on a fix.. |
Awesome. |
8d32e15
to
31dfe32
Compare
8cea50f
to
d11b2fa
Compare
@SeanTAllen fix pushed.. release notes added.. i believe the |
Awesome work @dipinhora. Thanks. |
Prior to this commit, there was a very rare edge case in the termination logic that could result in early shutdown resulting in a segfault.
This commit simplifies and reworks the shutdown/termination logic in order to make it more robust with less edge cases.
The logic now: