Add "dynamic" parallelization strategies. #51
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change adds the following parallelization functions:
pthreadpool_parallelize_1d_dynamic
,pthreadpool_parallelize_2d_dynamic_1d
,pthreadpool_parallelize_2d_dynamic
, andpthreadpool_parallelize_3d_dynamic_2d
.These functions are similar to the
pthreadpool_parallelize_Xd_tile_Yd
functions, but differ in that thetile_*
values are not bounds, but instead preferred multiples.For example,
pthreadpool_parallelize_1d_dynamic(threadpool, task, context, range, tile, flags)
calls the user-providedtask
functionwhere
offset
is in the range[0, range)
and an integer multiple oftile
, andcount
is an integer multiple oftile
unlessoffset + count = range
.The
tile
parameter is understood as a preferred multiple of indices, and not as an upper bound.The
count
s are chosen such as to minimize the number of calls totask
while still balancing the workload across all threads.Under the hood, each thread tries to reserve a "chunk" of tiles corresponding to
i.e. the number of remaining tasks divided by 2x the number of threads, rounded up to the next multiple of
tile
. This balances well provided the speed difference between the fastest and slowest threads does not exceed a factor of 2.