Fix for data races when threads are dynamically created #49
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Problem Description
I discovered data race issues during one of my audits of the EU project POP2 using Extrae.
The code I used when I ran into those issues was using Intel Thread Building Block (TBB) where threads are sometimes dynamically created.
Looking a bit deeper into Extrae, I noticed that data structures for tracing events are created for each thread which is good. However, the global management data structures that point to those data structures are reallocated when the number of threads changes. This process is not thread-safe.
Reproducer
Attached I also created a small reproducer for the problem.
The reproducer creates a pthread that is then dynamically creating more pthreads whereas the master thread is constantly spaming MPI_IProbes. Thus, there is constant access to tracing data structures conflicting with resizing data structures when the number of threads changes.
extrae-datarace-reproducer.zip
With the current Extrae version this results in the following seg fault:
Caught signal 11 (Segmentation fault: address not mapped to object at address 0x9) ==== backtrace ==== 0 /usr/lib64/libucs.so.0(+0x17970) [0x2abcf957f970] 1 /usr/lib64/libucs.so.0(+0x17b22) [0x2abcf957fb22] 2 /lib64/libc.so.6(+0x80419) [0x2abc388c1419] 3 /lib64/libc.so.6(+0x820a5) [0x2abc388c30a5] 4 /lib64/libc.so.6(+0x83ce2) [0x2abc388c4ce2] 5 /lib64/libc.so.6(realloc+0x1d2) [0x2abc388c6d82] 6 /home/jk869269/install/c_extrae/3.8.3/lib/libptmpitrace-3.8.3.so(realloc+0x3d) [0x2abc3507abbd] 7 /home/jk869269/install/c_extrae/3.8.3/lib/libptmpitrace-3.8.3.so(Extrae_reallocate_thread_info+0x1e) [0x2abc35073f0e] 8 /home/jk869269/install/c_extrae/3.8.3/lib/libptmpitrace-3.8.3.so(Backend_ChangeNumberOfThreads+0x3e3) [0x2abc3506be93] 9 /home/jk869269/install/c_extrae/3.8.3/lib/libptmpitrace-3.8.3.so(pthread_create+0x103) [0x2abc350865f3] 10 reproducer.exe() [0x4024d5] 11 /home/jk869269/install/c_extrae/3.8.3/lib/libptmpitrace-3.8.3.so(+0x974be) [0x2abc350864be] 12 /lib64/libpthread.so.0(+0x7ea5) [0x2abc36036ea5] 13 /lib64/libc.so.6(clone+0x6d) [0x2abc3893f8dd] ===================
Solution
With some debugging and analysis, I created a fix for the problem by using read write locks whereas possible to keep performance up but get rid of the data races. Not sure whether I found all places.
At least the proposed changes solved the issues I had with the TBB code and the reproducer.
Further I also implemented some changes to use lock/unlock Extrae internal pthread locks directly. Otherwise these would also be instrumented which creates additional overhead.