-
Notifications
You must be signed in to change notification settings - Fork 255
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
grpcio::Env can leak threads -- it detaches them instead of joining them #455
base: master
Are you sure you want to change the base?
Conversation
Environment is usually wrapped in
I don't get it. All servers and clients should be shutdown before the environment, so even though the thread is not shutdown, it should not try to touch "enclaves". Is it possible to make environment out of the loop scope? |
The enclave destructor calls a C library. A handle to our enclave object is owned by the grpc service object that we register with the server. So that resource is still preserved until the grpc service thread is collected, as far as I understand. If shutting down the server does not join the threads, then shutting down the server does not collect the enclaves, right? So those destructors are never called until the threads close of their own accord. The join handles appear to be owned by the grpcio environment so nothing can join them but the environment, and I have no way to make it do that. Here's one of the shorter kinds of stacktraces I see:
So the threads of grpcio continue exiting, even after the loop body is over, even after the test has ended and been declared `test result: okay". I think this is because they are detached threads. To the best of my understanding, in rust if the join handle is dropped without being explicitly joined, then it is detached instead: https://doc.rust-lang.org/std/thread/struct.JoinHandle.html I have been able to make my test pass 100% of the time even in release mode by inserting |
it might be but it will comingle the server resources across passes through the loop. Also, it doesn't help me ensure that the threads actually exit before the test function exits. |
This is the version of the loop that I am working with for now, which seems to have fixed things for me:
The idea being that once the threads have been shutdown, I don't know that they have actually stopped but I hope they will try to stop soon, so 1000ms is maybe enough. If I can join them then I don't need a sleep I think. |
Thanks for the detail explanation. I can see there are two things can be done:
|
I think I cannot guard "SimEnclaveMgr", it is a static-lifetime variable in the intel C library, I think it only gets torn down after If you are okay to join the threads in the Thank you! |
Joining in drop seems good to me. Although you may need to check if the current thread equal to the target thread to avoid deadlock. |
Please sign off all your commits and fix the CI. |
hi sorry i got distracted, I will do it |
1645992
to
701db5c
Compare
`grpcio::env` impl of `Drop` issues commands to request all the completion queues to shutdown, but does not actually join the threads. For a lot of webservers this works fine, but for some tests, it creates a problem. In my usecase I have a server containing SGX enclaves and a database, and I want to validate that even if the server goes down and comes back repeatedly, the users are able to recover their data from the database. ``` let users = ... mock user set for phase_count in 0..NUM_PHASES { log::info!(logger, "Phase {}/{}", phase_count + 1, NUM_PHASES); // First make grpcio env let grpcio_env = mobile_acct_api::make_env(); ... make server, make client, ... make requests for each mock user, ... validate results } ``` Unfortunately for me, even though `grpcio_env` is scoped to the loop body, the threads actually leak out because the implementation of `Drop` does not join the threads. Unfortunately, this consistently causes crashes in tests because intel sgx sdk contains a `SimEnclaveMgr` object which has a static lifetime and is torn down at process destruction. I believe that with the current API, I cannot guarantee that my grpcio threads are torn down BEFORE that object is. The only way that I can do that is if there is some API on `grpcio::Environment` that actually joins the threads. In the actual rust tests that validate `grpcio::Environment`, you yourselves have written code that joins the join handles. I would like to be able to do that in my tests at the end of my loop body. This commit exposes an API on grpcio::Environment that both issues the shutdown command, AND joins the join handles. It also makes the rust unit test, in that same file, use this API. This is not a breaking change, since we don't change the implementation of `Drop` or any other public api. Signed-off-by: Chris Beck <[email protected]>
701db5c
to
452e447
Compare
Includes a test for whether any of them is the current thread before joining. Signed-off-by: Chris Beck <[email protected]>
452e447
to
1e473da
Compare
it seems to fail like this:
it did this twice in CI, not sure. will investigate |
@BusyJay i'm sorry, i don't have bandwidth to really figure this out right now, i think i'm just going to stick with the "sleep" in my tests for the forseeable future. thanks for your help! |
The
grpcio::env
impl ofDrop
requests that all thecompletion queues shutdown, but does not actually join the threads.
For many applications this works fine, often a webserver does not require a graceful shutdown strategy.
However, in my usecase I want to validate that even if the server goes down and comes back
repeatedly, the users are able to recover their data from the database.
Although
grpcio_env
is scoped to the loop body, the implementation ofDrop
does not join the threads. When the test ends, it crashes consistently,because my server contains an SGX enclave, and there is a static object in
the intel library
SimEnclaveMgr
which is torn downbefore these threads get cleaned up. Then they try to tear down their enclaves
and SIGSEGV occurs.
I believe that with the current API, I cannot guarantee that
my grpcio threads are torn down before that object is. The only way
that I can do that is if there is some API on
grpcio::Environment
that actually joins the threads.
In the
grpc-rs
rust tests that validategrpcio::Environment
, youyourselves have written code that explicitly joins the join handles,
instead of leaving them detached. I would
like to be able to do that in my tests at the end of my loop body.
I would like to expose this functionality as a new public function.
This commit creates a new function
shutdown_and_join
, whichissues the shutdown command, and then joins the join handles.
It also makes the rust unit test in
grpc-rs
use that API.I would use this at the end of my loop body in my code example.
This is not a breaking change, since we don't change the implementation
of
Drop
or any other current public api.