Adding tests for sparse exploit codegen #928
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR adds some tests and makes some minor changes to the sparsity exploit lowering pass
src/compiler/lowering/SparsityExploitationPass.cpp
and is complementary to #919. For clarification, the currently fixed operations for theSparsityExploitationPass
implement a cross-entropy computation of a sparseCSRMatrix
and aDenseMatrix
Changes:
Operation *
type with cast to specific Operation (auto op = parentOp.getDefiningOp<daphne::OpName>()
)get
functions for better readability (e.g..getLhs()
,.getTransa()
)notifyMatchFailure
withErrorHandler::compilerError
Complementing the original PR's description:
The
lower-sparse-exploit
pass is part of themlir-codegen
pipeline. It is a proof of concept with the aim to fuse operators that have a sparse operand that enables optimizations in the computation of its intermediates. For example, in an expression likesparse * (dense @ dense)
the full dense result of the right matrix multiplication is not needed. It suffices to compute only the entries where the lhs sparse matrix is not zero. Using this information, a fused operator can avoid needless computations as well as materializing potentially very large dense intermediates.Right now, the pass does this for a hard-coded pattern
sum(CSRMat * ln(denseLhs @ t(denseRhs)))
(transpose on both sides are optional), which could be generalized to something likeIntersectOp(CSRMat, OuterBinary(DenseMat, DenseMat))
. It runs a canonicalizer pass directly after lowering this pattern to avoid the lowering/execution of any redundant operations that are part of the pattern (seetest/codegen/sparseExploit.mlir
). These are handled in this separate pass as their results could still be relevant for other computations and are not generally trivially dead.Here is an example script to test the pass (
--explain mlir_codegen
is optional):which shows a significant speedup on my machine from
0.17
seconds to0.09
seconds to compute the result.Currently, the pass lowers to an
affine ParallelOp
for the outer loop, but in theory all three loops can be parallelized. Note that this is not yet lowered to an actual multi-threaded implementation in MLIR (which requires additional lowering and linking of the respective libraries) and still runs single-threaded.