Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This patch implements two JuliaFolds interfaces on
ChainedVector
: the sequential iteration protocol (akafoldl
) using FGenerators.jl syntax and SplittablesBase.jl interface for parallel reductions.Arguably, the dependency tree pulled via FGenerators.jl, especially Transducers.jl, is rather large. I'm not sure if you want to pull this in at this stage (i.e., probably you'd want to wait until I extract it out as FoldsBase.jl). But I thought it'd be interesting to demonstrate that using JuliaFolds' iteration facility can be beneficial for not only parallel reduction but also for sequential iterations. For example, maybe this can make some parts of the optimization like #42 easier.
This patch uses FGenerators.jl which is a syntax sugar of
Transducers.__foldl__
. This is mainly because writing__foldl__
is slightly tedious and also I may need to tweak the interface for solving some subtle problems in parallel reduction at some point. But I expect the syntax sugar provided by FGenerators.jl to be more stable.Microbenchmark
A simple summation of
ChainedVector{Int}
is 4x faster with@floop
that usesfoldl
as the iteration mechanism. Looking into LLVM,@floop
version is vectorized butiterate
version is not.Note: I'm using
Int
as the element type so that vectorization can be triggered easily. Supporting@simd
for floats is possible but ATM it requires a rather ugly macro.I think it's a big win, also considering that the
@yield
-based syntax is much simpler than the complexiterate
implementation:SentinelArrays.jl/src/folds.jl
Lines 4 to 10 in ec17e62