-
-
Notifications
You must be signed in to change notification settings - Fork 421
Fix issue 17413: Prevent deadlock in the GC init #1872
base: master
Are you sure you want to change the base?
Conversation
In case any of the GC operations in the thread_attachThis fail with the Error, it's likely that GC.enable will also throw an error. This is not allowed for the statically allocated errors (such as InvalidMemoryOperationError, commonly thrown by GC), and it will cause deadlock where the t->next will be same as t. In order to prevent this, we need to make scope(exit) GC.enable() scope(success) GC.enable(), but then to make sure there will be no exception which will cause GC.enable to be skiped, we will make entire thread_attachThis nothrow.
Since thread_attachThis is nothrow, this is the same as scope(exit), just it will avoid using GC when the Error is thrown, possibly by the GC itself.
Since this field was used to prevent recursive calls to GC to itself while is locked, not just when running inside finalizer, this field is renamed to better describe its purpose.
Thanks for your pull request, @Burgos! We are looking forward to reviewing it, and you should be hearing from a maintainer soon. Some tips to help speed things up:
Bear in mind that large or tricky changes may require multiple rounds of review and revision. Please see CONTRIBUTING.md for more information. Bugzilla references
|
I'm not sure how flaky the test case would be, since it relies on the fact that we don't have enough address space for the runtime to be initialised, but enough for shared libraries to be loaded. |
Looks like it could be that vibe.d hanging could be not related this this PR: https://ci.dlang.io/blue/organizations/jenkins/dlang-org%2Fdruntime/detail/PR-1721/17/pipeline/ (last five druntime PRs and bunch phobos ones are hanging in vibe.d) |
If the program during initialization in the pressing memory environment died with the Error thrown by the GC, GC would not be usable anymore. However, dso registry unregistration will try to removeRanges. This would cause a deadlock in the GC, preventing the exit of the program. This commit prevents GC entering the recursive lock from the chain GC.fullcollect() -> Error -> dso_registry -> GC.removeRange.
@Burgos the vibe.d failures are not related to your PR. A fix for that will hopefully be deployed soon - see: |
At least the first part of the call chain (GC.fullcollect() -> Error) could be tested pretty reliably by attempting to allocate memory from a class dtor executed during collection, though I'm not sure if this will hit the case you describe. About the other call chain (thread_attachThis -> fullcollect -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, but the fix seems not to address the root issue.
@@ -259,12 +259,12 @@ class ConservativeGC : GC | |||
|
|||
import core.internal.spinlock; | |||
static gcLock = shared(AlignedSpinLock)(SpinLock.Contention.lengthy); | |||
static bool _inFinalizer; | |||
static bool _isLocked; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's easily confusable with the global GC lock? Also it's purpose is not locking but detecting reentrancy from finalizers. I think the name is rather appropriate.
size_t freedLargePages=void; | ||
{ | ||
scope (failure) ConservativeGC._inFinalizer = false; | ||
scope (failure) ConservativeGC._isLocked = false; | ||
freedLargePages = sweep(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inFinalizer
is set around sweep b/c that's where we call object finalizers.
rangesLock.lock(); | ||
rootsLock.lock(); | ||
scope (exit) | ||
{ | ||
rangesLock.unlock(); | ||
rootsLock.unlock(); | ||
ConservativeGC._isLocked = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From your description the failure seems to be that we're not unlocking those ranges on failure, so it's just broken cleanup on Error, but setting the reentrant flag is quite an ugly hackaround.
An explicit try / catch (Error) / cleanup / rethrow
works.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, of course. I'll update.
Ping @Burgos |
@MartinNowak sorry, I went to vacation, then had quite some catch-up to do. Will address this tomorrow, thanks for ping! |
Ping @Burgos, please update as suggested by Martin. |
If the program during initialization in the pressing memory
environment died with the Error thrown by the GC, GC would
not be usable anymore. However, dso registry unregistration
will try to removeRanges. This would cause a deadlock in the GC,
preventing the exit of the program. This commit prevents GC entering
the recursive lock from the chain GC.fullcollect() -> Error -> dso_registry
-> GC.removeRange. Another deadlock would be thread_attachThis -> fullcollect ->
assert -> new AssertError -> scope(exit) GC.enable which is solved by the first
two commits.