Skip to content
This repository has been archived by the owner on Oct 12, 2022. It is now read-only.

Fix issue 17413: Prevent deadlock in the GC init #1872

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Burgos
Copy link
Member

@Burgos Burgos commented Jul 13, 2017

If the program during initialization in the pressing memory
environment died with the Error thrown by the GC, GC would
not be usable anymore. However, dso registry unregistration
will try to removeRanges. This would cause a deadlock in the GC,
preventing the exit of the program. This commit prevents GC entering
the recursive lock from the chain GC.fullcollect() -> Error -> dso_registry
-> GC.removeRange. Another deadlock would be thread_attachThis -> fullcollect ->
assert -> new AssertError -> scope(exit) GC.enable which is solved by the first
two commits.

In case any of the GC operations in the thread_attachThis fail with
the Error, it's likely that GC.enable will also throw an error. This is
not allowed for the statically allocated errors (such as
InvalidMemoryOperationError, commonly thrown by GC), and it will
cause deadlock where the t->next will be same as t.

In order to prevent this, we need to make scope(exit) GC.enable()
scope(success) GC.enable(), but then to make sure there will be no
exception which will cause GC.enable to be skiped, we will make entire
thread_attachThis nothrow.
Since thread_attachThis is nothrow, this is the same as scope(exit),
just it will avoid using GC when the Error is thrown, possibly by
the GC itself.
Since this field was used to prevent recursive calls to GC
to itself while is locked, not just when running inside finalizer,
this field is renamed to better describe its purpose.
@dlang-bot
Copy link
Contributor

Thanks for your pull request, @Burgos! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.

Some tips to help speed things up:

  • smaller, focused PRs are easier to review than big ones

  • try not to mix up refactoring or style changes with bug fixes or feature enhancements

  • provide helpful commit messages explaining the rationale behind each change

Bear in mind that large or tricky changes may require multiple rounds of review and revision.

Please see CONTRIBUTING.md for more information.

Bugzilla references

Auto-close Bugzilla Description
17413 Deadlock if allocation fails during runtime initialization

@dlang-bot dlang-bot added the Bug Fix Include reference to corresponding bugzilla issue label Jul 13, 2017
@Burgos
Copy link
Member Author

Burgos commented Jul 13, 2017

I'm not sure how flaky the test case would be, since it relies on the fact that we don't have enough address space for the runtime to be initialised, but enough for shared libraries to be loaded.

@Burgos
Copy link
Member Author

Burgos commented Jul 13, 2017

If the program during initialization in the pressing memory
environment died with the Error thrown by the GC, GC would
not be usable anymore. However, dso registry unregistration
will try to removeRanges. This would cause a deadlock in the GC,
preventing the exit of the program. This commit prevents GC entering
the recursive lock from the chain GC.fullcollect() -> Error -> dso_registry
-> GC.removeRange.
@PetarKirov
Copy link
Member

@Burgos the vibe.d failures are not related to your PR. A fix for that will hopefully be deployed soon - see:

@PetarKirov
Copy link
Member

At least the first part of the call chain (GC.fullcollect() -> Error) could be tested pretty reliably by attempting to allocate memory from a class dtor executed during collection, though I'm not sure if this will hit the case you describe.

About the other call chain (thread_attachThis -> fullcollect ->
assert -> new AssertError -> scope(exit) GC.enable) - how easy it is to reproduce?

Copy link
Member

@MartinNowak MartinNowak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, but the fix seems not to address the root issue.

@@ -259,12 +259,12 @@ class ConservativeGC : GC

import core.internal.spinlock;
static gcLock = shared(AlignedSpinLock)(SpinLock.Contention.lengthy);
static bool _inFinalizer;
static bool _isLocked;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's easily confusable with the global GC lock? Also it's purpose is not locking but detecting reentrancy from finalizers. I think the name is rather appropriate.

size_t freedLargePages=void;
{
scope (failure) ConservativeGC._inFinalizer = false;
scope (failure) ConservativeGC._isLocked = false;
freedLargePages = sweep();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

inFinalizer is set around sweep b/c that's where we call object finalizers.

rangesLock.lock();
rootsLock.lock();
scope (exit)
{
rangesLock.unlock();
rootsLock.unlock();
ConservativeGC._isLocked = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From your description the failure seems to be that we're not unlocking those ranges on failure, so it's just broken cleanup on Error, but setting the reentrant flag is quite an ugly hackaround.
An explicit try / catch (Error) / cleanup / rethrow works.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, of course. I'll update.

@MartinNowak
Copy link
Member

Ping @Burgos

@Burgos
Copy link
Member Author

Burgos commented Aug 19, 2017

@MartinNowak sorry, I went to vacation, then had quite some catch-up to do. Will address this tomorrow, thanks for ping!

@rainers
Copy link
Member

rainers commented Oct 31, 2017

Ping @Burgos, please update as suggested by Martin.

@dlang-bot dlang-bot added Needs Rebase needs a `git rebase` performed and removed Needs Rebase needs a `git rebase` performed labels Jan 1, 2018
@leandro-lucarella-sociomantic
Copy link
Contributor

@nemanja-boric-sociomantic ? :)

@rainers rainers added the GC garbage collector label Dec 24, 2018
@dlang-bot dlang-bot added Needs Rebase needs a `git rebase` performed Needs Work stalled labels May 27, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Bug Fix Include reference to corresponding bugzilla issue GC garbage collector Needs Rebase needs a `git rebase` performed Needs Work stalled
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants