Fix issue 17413: Prevent deadlock in the GC init #1872

Burgos · 2017-07-13T17:24:08Z

If the program during initialization in the pressing memory
environment died with the Error thrown by the GC, GC would
not be usable anymore. However, dso registry unregistration
will try to removeRanges. This would cause a deadlock in the GC,
preventing the exit of the program. This commit prevents GC entering
the recursive lock from the chain GC.fullcollect() -> Error -> dso_registry
-> GC.removeRange. Another deadlock would be thread_attachThis -> fullcollect ->
assert -> new AssertError -> scope(exit) GC.enable which is solved by the first
two commits.

In case any of the GC operations in the thread_attachThis fail with the Error, it's likely that GC.enable will also throw an error. This is not allowed for the statically allocated errors (such as InvalidMemoryOperationError, commonly thrown by GC), and it will cause deadlock where the t->next will be same as t. In order to prevent this, we need to make scope(exit) GC.enable() scope(success) GC.enable(), but then to make sure there will be no exception which will cause GC.enable to be skiped, we will make entire thread_attachThis nothrow.

Since thread_attachThis is nothrow, this is the same as scope(exit), just it will avoid using GC when the Error is thrown, possibly by the GC itself.

Since this field was used to prevent recursive calls to GC to itself while is locked, not just when running inside finalizer, this field is renamed to better describe its purpose.

dlang-bot · 2017-07-13T17:24:09Z

Thanks for your pull request, @Burgos! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.

Some tips to help speed things up:

smaller, focused PRs are easier to review than big ones
try not to mix up refactoring or style changes with bug fixes or feature enhancements
provide helpful commit messages explaining the rationale behind each change

Bear in mind that large or tricky changes may require multiple rounds of review and revision.

Please see CONTRIBUTING.md for more information.

Bugzilla references

Auto-close	Bugzilla	Description
✓	17413	Deadlock if allocation fails during runtime initialization

Burgos · 2017-07-13T17:26:19Z

I'm not sure how flaky the test case would be, since it relies on the fact that we don't have enough address space for the runtime to be initialised, but enough for shared libraries to be loaded.

Burgos · 2017-07-13T17:44:13Z

Looks like it could be that vibe.d hanging could be not related this this PR:

https://ci.dlang.io/blue/organizations/jenkins/dlang-org%2Fdruntime/detail/PR-1721/17/pipeline/
https://ci.dlang.io/blue/organizations/jenkins/dlang-org%2Fdruntime/detail/PR-1870/1/pipeline/
https://ci.dlang.io/blue/organizations/jenkins/dlang-org%2Fdruntime/detail/PR-1869/3/pipeline/

(last five druntime PRs and bunch phobos ones are hanging in vibe.d)

If the program during initialization in the pressing memory environment died with the Error thrown by the GC, GC would not be usable anymore. However, dso registry unregistration will try to removeRanges. This would cause a deadlock in the GC, preventing the exit of the program. This commit prevents GC entering the recursive lock from the chain GC.fullcollect() -> Error -> dso_registry -> GC.removeRange.

PetarKirov · 2017-07-14T04:32:31Z

@Burgos the vibe.d failures are not related to your PR. A fix for that will hopefully be deployed soon - see:

Remove explicit thread destruction code in the libasync driver. vibe-d/vibe.d#1837
Use more reliable sleep times and test both orders of operations. vibe-d/vibe.d#1838

PetarKirov · 2017-07-14T04:42:34Z

At least the first part of the call chain (GC.fullcollect() -> Error) could be tested pretty reliably by attempting to allocate memory from a class dtor executed during collection, though I'm not sure if this will hit the case you describe.

About the other call chain (thread_attachThis -> fullcollect ->
assert -> new AssertError -> scope(exit) GC.enable) - how easy it is to reproduce?

MartinNowak

Good point, but the fix seems not to address the root issue.

MartinNowak · 2017-08-03T15:45:04Z

src/gc/impl/conservative/gc.d

@@ -259,12 +259,12 @@ class ConservativeGC : GC

    import core.internal.spinlock;
    static gcLock = shared(AlignedSpinLock)(SpinLock.Contention.lengthy);
-    static bool _inFinalizer;
+    static bool _isLocked;


That's easily confusable with the global GC lock? Also it's purpose is not locking but detecting reentrancy from finalizers. I think the name is rather appropriate.

MartinNowak · 2017-08-03T15:45:29Z

src/gc/impl/conservative/gc.d

        size_t freedLargePages=void;
        {
-            scope (failure) ConservativeGC._inFinalizer = false;
+            scope (failure) ConservativeGC._isLocked = false;
            freedLargePages = sweep();


inFinalizer is set around sweep b/c that's where we call object finalizers.

MartinNowak · 2017-08-03T15:56:19Z

src/gc/impl/conservative/gc.d

            rangesLock.lock();
            rootsLock.lock();
            scope (exit)
            {
                rangesLock.unlock();
                rootsLock.unlock();
+                ConservativeGC._isLocked = false;


From your description the failure seems to be that we're not unlocking those ranges on failure, so it's just broken cleanup on Error, but setting the reentrant flag is quite an ugly hackaround.
An explicit try / catch (Error) / cleanup / rethrow works.

You're right, of course. I'll update.

MartinNowak · 2017-08-19T19:16:25Z

Ping @Burgos

Burgos · 2017-08-19T20:14:26Z

@MartinNowak sorry, I went to vacation, then had quite some catch-up to do. Will address this tomorrow, thanks for ping!

rainers · 2017-10-31T15:54:33Z

Ping @Burgos, please update as suggested by Martin.

leandro-lucarella-sociomantic · 2018-05-04T10:47:04Z

@nemanja-boric-sociomantic ? :)

Burgos added 3 commits July 13, 2017 17:03

Enable GC in thread_attachThis in scope(success) instead scope(exit)

4c97b07

Since thread_attachThis is nothrow, this is the same as scope(exit), just it will avoid using GC when the Error is thrown, possibly by the GC itself.

Rename ConservativeGC._in_Finalizer to _is_Locked

d5088da

Since this field was used to prevent recursive calls to GC to itself while is locked, not just when running inside finalizer, this field is renamed to better describe its purpose.

dlang-bot added the Bug Fix Include reference to corresponding bugzilla issue label Jul 13, 2017

Burgos force-pushed the gc-low branch from 224e35c to 539ce6e Compare July 13, 2017 17:52

Burgos force-pushed the gc-low branch from 539ce6e to 8a61b43 Compare July 13, 2017 17:54

nemanja-boric-sociomantic mentioned this pull request Jul 14, 2017

Gracefully handle assertion errors during runtime startup/teardown #1651

Open

MartinNowak suggested changes Aug 3, 2017

View reviewed changes

dlang-bot added Needs Rebase needs a `git rebase` performed and removed Needs Rebase needs a `git rebase` performed labels Jan 1, 2018

rainers added the GC garbage collector label Dec 24, 2018

dlang-bot added Needs Rebase needs a `git rebase` performed Needs Work stalled labels May 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix issue 17413: Prevent deadlock in the GC init #1872

Fix issue 17413: Prevent deadlock in the GC init #1872

Burgos commented Jul 13, 2017

dlang-bot commented Jul 13, 2017

Burgos commented Jul 13, 2017

Burgos commented Jul 13, 2017 •

edited

Loading

PetarKirov commented Jul 14, 2017

PetarKirov commented Jul 14, 2017

MartinNowak left a comment

MartinNowak Aug 3, 2017

MartinNowak Aug 3, 2017

MartinNowak Aug 3, 2017

nemanja-boric-sociomantic Aug 3, 2017

MartinNowak commented Aug 19, 2017

Burgos commented Aug 19, 2017

rainers commented Oct 31, 2017

leandro-lucarella-sociomantic commented May 4, 2018

Fix issue 17413: Prevent deadlock in the GC init #1872

Are you sure you want to change the base?

Fix issue 17413: Prevent deadlock in the GC init #1872

Conversation

Burgos commented Jul 13, 2017

dlang-bot commented Jul 13, 2017

Bugzilla references

Burgos commented Jul 13, 2017

Burgos commented Jul 13, 2017 • edited Loading

PetarKirov commented Jul 14, 2017

PetarKirov commented Jul 14, 2017

MartinNowak left a comment

Choose a reason for hiding this comment

MartinNowak Aug 3, 2017

Choose a reason for hiding this comment

MartinNowak Aug 3, 2017

Choose a reason for hiding this comment

MartinNowak Aug 3, 2017

Choose a reason for hiding this comment

nemanja-boric-sociomantic Aug 3, 2017

Choose a reason for hiding this comment

MartinNowak commented Aug 19, 2017

Burgos commented Aug 19, 2017

rainers commented Oct 31, 2017

leandro-lucarella-sociomantic commented May 4, 2018

Burgos commented Jul 13, 2017 •

edited

Loading