-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix get_stack_base in linux #1389
Fix get_stack_base in linux #1389
Conversation
In case of pthread_attr_getstack() returns np_stack_size = 1MB, - calculated stack_base = stack_limit + stack_size(default: 4MB) - real stack_base = stack_limit + np_stack_size(1MB) stack_base is wrong because calculated with 4MB default stack size. Due to this wrong stack_base, task_dispatcher::can_steal() can be always false. It causes abnormal long loop in task_dispatcher::receive_or_steal_task(). Correct this problem using np_stack_size instead of stack_size. Signed-off-by: Bongkyu Kim <[email protected]>
Hi @bongkyu7-kim, was the stack size somehow changed or it is smaller than 4mb on your system by default? |
Hi @pavelkumbrasev, |
The problem is that it will just override a part of defined behavior. TBB will still create and treat threads like they have stack size of 4 mb. |
Even if the stack size is 8mb, the current stack_base is miscalculated.
Why does stack_limit use the return value of pthread_attr_getstack() and stack_size use the defined value on the code? |
I understand that the patch is doing the right thing. The only problem is that it will hide the real problem: default stack value is still incorrect for your platform. Moreover, this default value will be used to create worker threads and for all the calculation unless you explicitly use |
I understood. |
As you said it might be affected by |
In fact, my problem is that geekbench benchmark(using the TBB library) score is dropped in our android product due to this wrong stack_base. |
I don't think geekbench needs to apply |
Because it can be changed at runtime and can have various stack sizes for each thread on the same platform, |
@pavelkumbrasev |
As I said it happens because default value is different from systems default this should be addressed there. We will discuss it and try to come up with a solution. |
In my opinion, there're two ways. How about these solutions?
--- a/src/tbb/global_control.cpp
+++ b/src/tbb/global_control.cpp
@@ -104,8 +104,20 @@ class alignas(max_nfs_size) stack_size_control : public control_storage {
return hi - lo;
}();
return ThreadStackSizeDefault;
+#else
+#if __linux__ && !__bg__
+ pthread_attr_t np_attr_stack;
+ size_t np_stack_size = 0;
+ if (0 == pthread_getattr_np(pthread_self(), &np_attr_stack)) {
+ if (0 == pthread_attr_getstacksize(&np_attr_stack, &np_stack_size)) {
+ __TBB_ASSERT( np_stack_size > 0, "stack size must be positive" );
+ }
+ pthread_attr_destroy(&np_attr_stack);
+ }
+ return np_stack_size > 0 ? np_stack_size : ThreadStackSize;
#else
return ThreadStackSize;
+#endif /* __linux__ */
#endif
}
void apply_active(std::size_t new_active) override {
--- a/src/tbb/governor.cpp
+++ b/src/tbb/governor.cpp
@@ -137,7 +137,7 @@ bool governor::does_client_join_workers(const rml::tbb_client &client) {
3) If the user app strives to conserve the memory by cutting stack size, it
should do this for TBB workers too (as in the #1).
*/
-static std::uintptr_t get_stack_base(std::size_t stack_size) {
+static std::uintptr_t get_stack_base(std::size_t &stack_size) {
// Stacks are growing top-down. Highest address is called "stack base",
// and the lowest is "stack limit".
#if __TBB_USE_WINAPI
@@ -165,7 +165,8 @@ static std::uintptr_t get_stack_base(std::size_t stack_size) {
#endif /* __linux__ */
std::uintptr_t stack_base{};
if (stack_limit) {
- stack_base = reinterpret_cast<std::uintptr_t>(stack_limit) + stack_size;
+ stack_base = reinterpret_cast<std::uintptr_t>(stack_limit) + np_stack_size;
+ stack_size = np_stack_size;
} else {
// Use an anchor as a base stack address.
int anchor{}; |
Is there any update? I'm still waiting for this issue fix. |
As I said the work around is there with |
Thanks for reply. I will wait for your solution. |
Hi @bongkyu7-kim, we have discussed possible solutions.
Should be good enough to fix the problem. The only difference is instead of checking stack size of a particular thread system value should be checked. As far as I know it can be done using rlimit and specifically What do you think? |
@pavelkumbrasev, But, I think that default stack size by RLIMIT_STACK and correct get_stack_base() like this PR patch can be one solution. |
@bongkyu7-kim, could you please clarify:
As I understand the main thread will be created with RLIMIT_STACK size by default. Is the stack somehow changed? |
@pavelkumbrasev, |
Should then oneTBB workers incapsulate this stack property? (Should they also have 1mb stack or default 8mb or 4 as for now) |
main thread(8mb) -> create pthread(1mb) - oneTBB API called : oneTBB main thread working here -> other oneTBB workers (4mb by misc.h) |
What I am asking: is it the desired behavior or not? Should oneTBB workers incapsulate initializing thread property and also use 1 mb stack size? |
Because oneTBB workers stack size are mixed (1mb/4mb), it may cause other problem. |
@pavelkumbrasev, |
@bongkyu7-kim, sorry for the long response. Could you please work on this PR? |
@pavelkumbrasev,
|
I'm not 100% sure. Perhaps, on external thread initialization it should check if the default stack size is equal to the current thread's stack size and if not it should safe this value into |
@pavelkumbrasev, |
|
With my patch(#1389 (comment)), the actual thread stack size checked inside get_stack_base() and pass to stack_size. So, I think it's enough to fix the problem.
|
How will |
I changed to call by reference. It can be updated. -static std::uintptr_t get_stack_base(std::size_t stack_size) {
|
I'm not found of such an implicit changes. So this will be applicable for the rest of the OS. |
Sure, I will change like your suggestion. |
@pavelkumbrasev, |
Move to PR #1485. Close this. |
Description
In case of pthread_attr_getstack() returns np_stack_size = 1MB,
stack_base is wrong because calculated with 4MB default stack size.
Due to this wrong stack_base, task_dispatcher::can_steal() can be always false.
It causes abnormal long loop in task_dispatcher::receive_or_steal_task().
Correct this problem using np_stack_size instead of stack_size.
Fixes # - issue number(s) if exists
Type of change
Choose one or multiple, leave empty if none of the other choices apply
Add a respective label(s) to PR if you have permissions
Tests
Documentation
Breaks backward compatibility
Notify the following users
List users with
@
to send notificationsOther information