Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/slp vectorization #69

Merged
merged 395 commits into from
Nov 19, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
395 commits
Select commit Hold shift + click to select a range
0b34497
Remove recursive postOrder, simplify node vector structure.
csvtuda May 4, 2021
1210a57
Fix: postOrder computation containing duplicates.
csvtuda May 5, 2021
43f6b32
Clean SLP graph builder.
csvtuda May 5, 2021
814a222
Add nlts & fashion-mnist tests.
csvtuda May 5, 2021
6efba13
Fix: SLP graph builder building SLP nodes more than once.
csvtuda May 5, 2021
b1ab2d4
Speed up postOrder computation.
csvtuda May 7, 2021
b0e9833
Speed up postOrder computation even more.
csvtuda May 7, 2021
bdca6de
Add basic cost model draft.
csvtuda May 7, 2021
4528de2
Rename SLP files.
csvtuda May 7, 2021
ee2e686
Separate postOrder from SLPNode.
csvtuda May 7, 2021
955235f
Update tests, remove outdated .mlir files.
csvtuda May 9, 2021
3726c1d
Prevent graph builder from including illegal vectors.
csvtuda May 9, 2021
9df0eb5
Remove const modifier from value getters.
csvtuda May 9, 2021
dc1e0c3
WIP optimal conversion insertion point computation.
csvtuda May 9, 2021
f199f2d
Rework operation sorting prior to conversion.
csvtuda May 11, 2021
8b24a31
Iterate through all vectors only once during conversion.
csvtuda May 11, 2021
183758e
Improve non-uniform vector handling.
csvtuda May 11, 2021
584fbbe
Fix: variable shadowing.
csvtuda May 11, 2021
c7cfd9a
Add constant duplication detection.
csvtuda May 12, 2021
a21af1e
Add PatternRewriter member to ConversionManager.
csvtuda May 12, 2021
490b3cc
Simplify bookkeeping of created vectors.
csvtuda May 12, 2021
399200f
Add timing and debug info.
csvtuda May 12, 2021
e4695a6
Rename NodeVector to ValueVector.
csvtuda May 12, 2021
67af15f
Fix extremely slow map key comparisons during graph building.
csvtuda May 13, 2021
621ebea
Replace escaping users with extracted value only instead of all users.
csvtuda May 13, 2021
d46c158
Fix: extremely slow insertion point computation during graph conversion.
csvtuda May 14, 2021
a555524
Rename IO method, improve output during graph conversion.
csvtuda May 14, 2021
838c278
Improve progress output.
csvtuda May 15, 2021
631e4fd
Fix: sort escaping users correctly after reordering.
csvtuda May 15, 2021
8828402
Add trivial case to later() check.
csvtuda May 15, 2021
e9b951b
Group checks in later() comparison.
csvtuda May 15, 2021
4021bd5
Fix: operations marked as erased still being used in input vectors.
csvtuda May 16, 2021
3672fe7
Refactor helper method.
csvtuda May 16, 2021
7563b9b
Allow multiple SLP vectorizations in a single conversion pass.
csvtuda May 17, 2021
0088841
Speed up seeding.
csvtuda May 17, 2021
5dd9939
Remove unnecessary block order recomputation.
csvtuda May 17, 2021
3718172
Improve progress output.
csvtuda May 17, 2021
5e5dcf1
Fix: users computation moving operations to bad locations.
csvtuda May 18, 2021
eb865f9
Merge branch 'develop' into feature/slp-vectorization
csvtuda May 18, 2021
65d6d30
Resolve leftover merging errors.
csvtuda May 18, 2021
712d13d
Update copyright.
csvtuda May 18, 2021
bbbd534
Replace std::sort with llvm::sort.
csvtuda May 18, 2021
e56a0ce
Speed up seed computation.
csvtuda May 18, 2021
dddefb8
Remove unused .mlir file.
csvtuda May 18, 2021
eb1167d
Add type and location information to ValueVector.
csvtuda May 19, 2021
4f37a65
Add custom SLP pattern driver and custom SLP patterns.
csvtuda May 19, 2021
2acccee
Replace SLP pattern benefit with their cost.
csvtuda May 19, 2021
654247b
Add missing explicit tags.
csvtuda May 19, 2021
76fbf4a
Prepare everything for bottom up seed analysis.
csvtuda May 19, 2021
16b4254
WIP Bottom-Up seeding.
csvtuda May 21, 2021
fe7f7fe
Adapt bottom-up seeding to group by opcode.
csvtuda May 24, 2021
553cc0d
Rename ValueVector to Superword.
csvtuda May 24, 2021
d7ec9f1
Do not recompute order for next seed.
csvtuda May 25, 2021
0ccc803
Finish bottom-up attempt.
csvtuda May 26, 2021
48da273
Integrate cost model into pattern matching process.
csvtuda May 27, 2021
03104c2
Add math dialect to structure pass's dependent dialects.
csvtuda May 29, 2021
13d1c88
Add actual SLPGraph class to store the graph data.
csvtuda May 31, 2021
30e2226
Add dependency analysis to SLPGraph.
csvtuda May 31, 2021
a671cb1
Add broadcast & broadcastInsert patterns.
csvtuda May 31, 2021
50ac08a
Split SLP pattern matching into header and implementation files.
csvtuda May 31, 2021
222f7f2
Implement scalar cost computation in unit cost model.
csvtuda Jun 1, 2021
d806fd0
Improve SLPVectorizationPattern type hierarchy.
csvtuda Jun 1, 2021
2667132
Implement leaf pattern only matching.
csvtuda Jun 1, 2021
487d137
Minimize number of insertions in broadcast insert pattern.
csvtuda Jun 1, 2021
bc65901
Streamline broadcast maximization.
csvtuda Jun 1, 2021
db2f9fe
Improve pattern visitor structure.
csvtuda Jun 2, 2021
25fc739
Rename variables in graph builder.
csvtuda Jun 2, 2021
06aa16e
Rework cost model integration.
csvtuda Jun 2, 2021
e90da84
Fix: segfault when falling back to default visit method.
csvtuda Jun 2, 2021
3c85938
Fix: missing conversion state initialization.
csvtuda Jun 2, 2021
b5a4c35
Add scalar cost caching & updates via callbacks.
csvtuda Jun 7, 2021
ab61ee1
Simplify extraction flags.
csvtuda Jun 7, 2021
93cd930
Minor refactoring.
csvtuda Jun 7, 2021
96424c9
Speed up vectorized gaussian pattern.
csvtuda Jun 8, 2021
36afde3
Filter out unnecessary ops during seeding.
csvtuda Jun 8, 2021
5510ad7
Use VectorType::isValidElementType for vectorization checks.
csvtuda Jun 8, 2021
0026d3b
Simplify insertion point calculation.
csvtuda Jun 9, 2021
e57ef73
Add basic log space SLP patterns.
csvtuda Jun 9, 2021
078a88f
Better class forwarding to get rid of excessive pattern forwarding.
csvtuda Jun 9, 2021
1364c89
Add SPNAttachLog operation.
csvtuda Jun 10, 2021
5ffd519
Fix: operation reordering not taking into account pattern failures.
csvtuda Jun 10, 2021
45bf824
Erase dead vector operations after graph conversion.
csvtuda Jun 10, 2021
52f0c1c
Fix: insertion point mixup for identical insertion points.
csvtuda Jun 10, 2021
da1193d
Add script that runs all speakers.
csvtuda Jun 12, 2021
6d776ad
Improve run_all script.
csvtuda Jun 12, 2021
d2ed9d1
Add RAT-SPNs to run_all script.
csvtuda Jun 12, 2021
8fa27a7
Add topological mixing analysis.
csvtuda Jun 16, 2021
5529e59
Add VectorizeLogConstant pattern.
csvtuda Jun 16, 2021
c3dac2b
Fix: add missing visitor call for VectorizeLogConstant.
csvtuda Jun 16, 2021
cf0f0c7
Fix: insertion point calculation being a broken mess (again).
csvtuda Jun 16, 2021
8e69049
Rewrite VectorizeLogConstant as VectorizeSPNConstant.
csvtuda Jun 16, 2021
866b5e8
Simplify log space patterns.
csvtuda Jun 16, 2021
2cb820d
WIP overall refactoring.
csvtuda Jun 22, 2021
80d3b62
Implement vectorized function cost computation and vectorization undo…
csvtuda Jun 24, 2021
0a05164
Remove deprecated methods.
csvtuda Jun 24, 2021
486377c
Remove leftover dump() calls.
csvtuda Jun 24, 2021
2529b59
Remove unnecessary first input computation.
csvtuda Jun 24, 2021
8c1607d
Make ConversionState recursive, move more conversion details to Conve…
csvtuda Jun 24, 2021
c72610d
Fix: missing conversion state initialization.
csvtuda Jun 24, 2021
29548ca
Remove bookkeeping of dead operations in ConversionState.
csvtuda Jun 24, 2021
2fb1db6
Fix: SPNAttachLog being applied to log-space operations.
csvtuda Jun 24, 2021
ab07a40
Fix: make Log1pOp legal.
csvtuda Jun 24, 2021
eb96d69
Fix: only attach log when needed.
csvtuda Jun 24, 2021
3ddb973
Fix: exp() returning infinity in vectorized log-space addition.
csvtuda Jun 28, 2021
90b40f7
Fix: denseConstant() failing with IOOB exception.
csvtuda Jun 28, 2021
9446086
Remove progress messages, remove leftover debugging output.
csvtuda Jun 28, 2021
11721ab
Speed up vectorized speaker test.
csvtuda Jun 28, 2021
a8a2355
Fix: seed analysis causing unnecessary vectorization attempts.
csvtuda Jun 28, 2021
81c7f73
Fix: log space being attached to operations that are in log space alr…
csvtuda Jun 29, 2021
88fe5ec
Add 'compile only' option to run_all script.
csvtuda Jun 29, 2021
a300cc9
Improve test scripts.
csvtuda Jun 29, 2021
cec6673
Increase traversal limit during deserialization.
csvtuda Jun 29, 2021
12a2abf
Replace recursive conversion state approach with temporary conversion…
csvtuda Jul 1, 2021
d9ef61f
Add single shuffle pattern.
csvtuda Jul 1, 2021
e769d9e
Don't move operations to and from a trash block.
csvtuda Jul 5, 2021
80bc546
Make operation reordering private.
csvtuda Jul 5, 2021
3167806
Remove debug statements.
csvtuda Jul 5, 2021
5b63723
Remove bookkeeping of best SLP pattern matches.
csvtuda Jul 5, 2021
33deed6
Add some comments to improve file structure.
csvtuda Jul 5, 2021
5e48727
Merge branch 'develop' into feature/slp-vectorization
csvtuda Jul 5, 2021
04d02f1
Remove debug statements.
csvtuda Jul 5, 2021
d1c04b4
Merge branch 'develop' into feature/slp-vectorization
csvtuda Jul 5, 2021
6a178f9
Increase traversal limit in words to 1024*1024*1024.
csvtuda Jul 5, 2021
dba9861
Add llvm-mca cost estimation scripts.
csvtuda Jul 7, 2021
1e40706
Add batch read to gather load rewrite pattern.
csvtuda Jul 17, 2021
9e85ad0
Add size analysis scripts.
csvtuda Jul 17, 2021
6479b9b
Fix typo.
csvtuda Jul 19, 2021
e2e7f86
Increase traversal limit in words to 2^64-1.
csvtuda Jul 19, 2021
94daf72
Merge branch 'develop' into feature/slp-vectorization
csvtuda Jul 20, 2021
86f9198
Move extendTruncateOrGetVector() to a util file.
csvtuda Jul 20, 2021
7c0a1a6
Move extendTruncateOrGetVector() definition to cpp file.
csvtuda Jul 20, 2021
3d6184a
Refine callback registration.
csvtuda Jul 20, 2021
3cc1636
Replace ShuffleSuperword pattern with more general ShuffleTwoSuperwords.
csvtuda Jul 20, 2021
4901add
Add execution times script, update other scripts.
csvtuda Jul 21, 2021
4e30160
Update comments.
csvtuda Jul 21, 2021
9f9f3a8
Improve shuffle pattern variable names.
csvtuda Jul 21, 2021
5c19d9a
Fix: segfault during gather load pattern.
csvtuda Jul 24, 2021
ba95a0f
Add timing output to SLP vectorization.
csvtuda Jul 24, 2021
18d54f7
Update python scripts.
csvtuda Jul 24, 2021
8290922
Rename python script.
csvtuda Jul 29, 2021
2864e88
Make seeding deterministic.
csvtuda Jul 29, 2021
f55e96c
Add DFS instruction reordering after conversion.
csvtuda Jul 29, 2021
819eea1
Perform DCE after SLP vectorization.
csvtuda Jul 29, 2021
73964ef
Use newer subprocess parameters in times script.
csvtuda Jul 29, 2021
b4d4aa9
Replace clunky comments with proper #defines and #ifs.
csvtuda Jul 30, 2021
097b3ac
Fix: extend/truncate normal space gaussian input too.
csvtuda Jul 30, 2021
88f161d
Make seeding more deterministic.
csvtuda Jul 30, 2021
054fd0b
Make conversion order deterministic.
csvtuda Aug 1, 2021
e9f8916
Fix: bug that altered the semantics of the vectorized program.
csvtuda Aug 2, 2021
fd49a11
Update comments and messages.
csvtuda Aug 2, 2021
f0aa10c
Improve operation deletion handling.
csvtuda Aug 2, 2021
8c24a27
Update utility IO.
csvtuda Aug 2, 2021
dacff55
Remove unnecessary const& modifiers.
csvtuda Aug 2, 2021
09ff860
Update python scripts.
csvtuda Aug 2, 2021
79a2c9a
Add topological and recomputation checks during graph building.
csvtuda Aug 2, 2021
b09dcab
Add SLP CLI options.
csvtuda Aug 3, 2021
7b203f0
Fix default SLP option initialization.
csvtuda Aug 3, 2021
98a1b8e
Update status reporting during vectorization.
csvtuda Aug 3, 2021
17ac914
Add multinode size logging.
csvtuda Aug 3, 2021
eb9095c
Add pass timings and pass statistics.
csvtuda Aug 3, 2021
bf34483
Fix: cost model using wrong patterns for cost computation.
csvtuda Aug 4, 2021
d5a1cd0
Add support for SLP vectorized gaussian marginalization.
csvtuda Aug 4, 2021
b591231
Fix: initialize uninitialized cost field.
csvtuda Aug 4, 2021
0073f9e
Improve status reporting and multinode size logging.
csvtuda Aug 4, 2021
05352f6
Add new max vectorization attempts option.
csvtuda Aug 4, 2021
732c100
Add recent options to python script too.
csvtuda Aug 4, 2021
3698d20
Only vectorize when requested in scripts.
csvtuda Aug 5, 2021
05ef904
Update python scripts.
csvtuda Aug 8, 2021
92a5667
Update CLI SLP options.
csvtuda Aug 8, 2021
9405554
Add kernel persistency flags to python scripts.
csvtuda Aug 9, 2021
2637eab
Add objdump analysis script.
csvtuda Aug 10, 2021
ce7c47c
Remove unused files.
csvtuda Aug 10, 2021
0d5d618
Fix: object dump parsing missing 80% of instructions.
csvtuda Aug 11, 2021
aa62f34
Only compute operation depths when needed.
csvtuda Aug 17, 2021
8304fdc
Resolve non-determinism during conversion order construction.
csvtuda Aug 17, 2021
8448fdb
Make ShuffleTwoSuperwords deterministic.
csvtuda Aug 17, 2021
71eb11e
Update python scripts.
csvtuda Aug 17, 2021
4ddf27a
Add liveness analysis.
csvtuda Aug 19, 2021
d2bc2c6
Update python scripts.
csvtuda Aug 19, 2021
d100f07
Fix: semantics computation accessing nonexistent operands.
csvtuda Aug 19, 2021
097c97e
Fix: unique ops computation including newly created vector operations.
csvtuda Aug 19, 2021
b4807e9
Update python scripts.
csvtuda Aug 20, 2021
557e9b2
Improve look-ahead score computation for load vectors.
csvtuda Aug 21, 2021
4503c1a
Allow topological mixing for leaf vectors.
csvtuda Aug 21, 2021
91ac21e
Streamline returns in code.
csvtuda Aug 21, 2021
1bf37d0
Allow duplicate elements in leaf vectors.
csvtuda Aug 21, 2021
09b53a7
Prevent future edge cases in if condition.
csvtuda Aug 21, 2021
aa463bc
Refactoring & add score model draft.
csvtuda Aug 22, 2021
b4938c0
Fix: invalidated iterator accesses during score computation.
csvtuda Aug 22, 2021
a2e0d54
Fix: gathers not taken into account for score computation.
csvtuda Aug 22, 2021
2f5fcaf
Improve comments, some refactoring.
csvtuda Aug 23, 2021
b953604
Improve and merge python scripts.
csvtuda Aug 23, 2021
30452fe
Update python script.
csvtuda Aug 23, 2021
ca84301
Fix python default arguments.
csvtuda Aug 23, 2021
ab38258
Add op type counts output.
csvtuda Aug 23, 2021
abb1aa0
Improve python script.
csvtuda Aug 23, 2021
a03b193
Update python scripts.
csvtuda Aug 23, 2021
0d96a1b
Add csv merging script.
csvtuda Aug 24, 2021
b62deab
Sort columns in merged CSV.
csvtuda Aug 24, 2021
2d09565
Count call instructions as arithmetic instructions depending on call …
csvtuda Aug 24, 2021
22155bb
Fix bug that erased the SPN files.
csvtuda Aug 25, 2021
8bb6ffd
Update scripts and logging output.
csvtuda Aug 26, 2021
7379c8e
Update python scripts.
csvtuda Aug 26, 2021
b3e2662
Fix: shuffle pattern reusing lanes with altered semantics.
csvtuda Aug 28, 2021
1f0bf24
Fix: kernel dump parsing.
csvtuda Sep 1, 2021
ee0fa2c
Add plots script.
csvtuda Sep 1, 2021
b0ba112
Update plots.
csvtuda Sep 1, 2021
9367b8e
Update python scripts.
csvtuda Sep 2, 2021
ae04f24
Update python scripts.
csvtuda Sep 5, 2021
3ecd900
Fix: XOR chains containing two operands only.
csvtuda Sep 10, 2021
cb0c322
Fix: consecutive load detection.
csvtuda Sep 10, 2021
6ae669a
Added little README.
csvtuda Sep 30, 2021
63f482d
Remove Analysis file.
csvtuda Oct 12, 2021
290aa95
Fix: Bottom-up seeding crashing in multiple SLP iterations.
csvtuda Oct 12, 2021
ef876c3
Fix: cost computation needlessly taking too long.
csvtuda Oct 12, 2021
a62d2de
Remove evaluation python scripts.
csvtuda Oct 23, 2021
22adf60
Update python test scripts.
csvtuda Oct 23, 2021
9e2aeb7
Merge branch 'develop' into feature/slp-vectorization
csvtuda Oct 26, 2021
050bb24
Add cost model comments.
csvtuda Nov 2, 2021
5a352bc
Add graph conversion comments.
csvtuda Nov 4, 2021
9c18df9
More graph conversion comments.
csvtuda Nov 4, 2021
c5fa5a4
Add pattern visitor comments & add const correctness.
csvtuda Nov 4, 2021
2afba62
More pattern visitor comments.
csvtuda Nov 4, 2021
320b737
Add score model comments.
csvtuda Nov 6, 2021
9ed37be
Add seeding comments.
csvtuda Nov 9, 2021
aec0d43
Add SLP graph comments.
csvtuda Nov 9, 2021
4c251fa
Add SLP graph builder comments.
csvtuda Nov 12, 2021
ba3c6a2
Add SLP pattern applicator comments.
csvtuda Nov 12, 2021
b503429
Add SLP vectorization pattern comments.
csvtuda Nov 12, 2021
3c684bd
Add vectorization util comments.
csvtuda Nov 12, 2021
832d336
Improve vectorization test error messages.
csvtuda Nov 12, 2021
e5e421d
Merge branch 'develop' into feature/slp-vectorization
csvtuda Nov 12, 2021
e19e431
Fix: use default keyword instead of explicit constructor.
csvtuda Nov 15, 2021
a98cc6a
Fix: make cost model unique and provide raw pointers for access.
csvtuda Nov 15, 2021
124a9e4
Fix: includes of TargetInformation.cpp.
csvtuda Nov 15, 2021
f087c57
Fix: callback loop handling.
csvtuda Nov 15, 2021
3f65e78
Fix: change total number of vectorization attempts to 1.
csvtuda Nov 15, 2021
df06782
Fix: replace test cases that took hours to complete with smaller ones.
csvtuda Nov 17, 2021
42a2557
Replace spnc-opt options with pass-specific options.
csvtuda Nov 18, 2021
c643e83
Fix: allow casting gaussian input vectors from integer to floating po…
csvtuda Nov 18, 2021
3be1bbe
Add comments to utility methods.
csvtuda Nov 18, 2021
7a88a02
Clean up vectorization patterns by combining constant and lospn const…
csvtuda Nov 18, 2021
59632c8
Fix: remove unnecessary kernel copies.
csvtuda Nov 18, 2021
5b6da11
Move SLP options from util file to individual classes.
csvtuda Nov 18, 2021
d16b1a1
Add SLP vectorization test.
csvtuda Nov 19, 2021
a7943b5
Move liveness analysis & output to helper function.
csvtuda Nov 19, 2021
f9cb4dc
Update python tests.
csvtuda Nov 19, 2021
f9a5a32
Add CMake option to control SLP vectorizer debug output;
sommerlukas Nov 19, 2021
64e3bde
Fix a few clang-tidy warnings;
sommerlukas Nov 19, 2021
2828702
Remove verbose output from Python tests;
sommerlukas Nov 19, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,10 @@ endif (${CUDA_GPU_SUPPORT})

set(SPNC_CXX_WARNING_LEVEL "-Wall")

option(SPNC_SLP_DEBUG
"Enable additional debug output for the SLP vectorizer"
OFF)

#
# clang-tidy setup
#
Expand Down
43 changes: 43 additions & 0 deletions README_SLP.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
# SLP-Vectorization #

The SLP directory contains 11 files, each dealing with a different SLP topic.

Please note that a _superword_ in this project describes an SLP vector containing the elements. This term was chosen because a _vector_ is slightly overloaded with meanings in C++.

* CostModel.h
* Contains the cost model, which assigns cost to scalar operations, superwords and entire patterns using a visitor pattern.

* GraphConversion.h
* A very important file that contains the ConversionManager class. The conversion manager keeps track of created vector operations, extractions and maintains a ConversionState. The conversion state is responsible for remembering which scalar/superword values have been computed already. The conversion manager is also responsible for gracefully resetting the function state in case an SLP graph is not deemed profitable.

* PatternVisitors.h
* Contains the visitor template and the LeafPatternVisitor, which can determine the scalar values that need to be computed for every leaf pattern (e.g. a BroadcastInsertPattern needs a scalar broadcast value and scalar insert values).

* ScoreModel.h
* Contains the Look-Ahead-Score score model from the original Look-Ahead SLP publication [[1]](https://dl.acm.org/doi/10.1145/3168807). Also contains the XOR chain model.

* Seeding.h
* Contains the classes used for top-down and bottom-up seeding.

* SLPGraph.h
* Contains the superword logic and the logic for actual SLP graphs (nodes and multinodes). Note: there is no explicit SLPGraphEdge class or something similar.

* SLPGraphBuilder.h
* Contains a graph builder that constructs SLP graphs as described in Porpodas et al. [[1]](https://dl.acm.org/doi/10.1145/3168807).

* SLPPatternMatch.h
* Responsible for selecting the best patterns based on the cost model and the current conversion state.

* SLPVectorizationPatterns.h
* The individual patterns that can be applied to superwords and their match and rewrite logic. They were designed in a somewhat similar fashion compared to MLIR's pattern rewrite framework.

* Util.h
* Some utility functions, such as _vectorizable(...)_ or _commutative(...)_.

### Known Issues ###
* ShufflePattern: With shuffle patterns enabled, the output of the kernels sometimes does not match the expected output. This might be due to the reordering changing semantics and the shuffle pattern accessing elements with changed semantics by accident.
* The SPN compiler options are replicated inside the util class. This is a little bit annoying.

References
-----
[[1] Vasileios Porpodas, Rodrigo C. O. Rocha, and Luís F. W. Góes. 2018. Look-ahead SLP: auto-vectorization in the presence of commutative operations. In Proceedings of the 2018 International Symposium on Code Generation and Optimization (CGO 2018). Association for Computing Machinery, New York, NY, USA, 163–174.](https://dl.acm.org/doi/10.1145/3168807)
20 changes: 20 additions & 0 deletions compiler/src/option/GlobalOptions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,26 @@ EnumOpt spnc::option::vectorLibrary{"vector-library", NONE,

Option<bool> spnc::option::replaceGatherWithShuffle{"use-shuffle", false};

Option<unsigned> spnc::option::slpMaxAttempts{"slp-max-attempts", 1, {depends(spnc::option::cpuVectorize, true)}};

Option<unsigned>spnc::option::slpMaxSuccessfulIterations
{"slp-max-successful-iterations", 1, {depends(spnc::option::cpuVectorize, true)}};

Option<unsigned> spnc::option::slpMaxNodeSize{"slp-max-node-size", 10, {depends(spnc::option::cpuVectorize, true)}};

Option<unsigned> spnc::option::slpMaxLookAhead{"slp-max-look-ahead", 3, {depends(spnc::option::cpuVectorize, true)}};

Option<bool>
spnc::option::slpReorderInstructionsDFS{"slp-reorder-dfs", true, {depends(spnc::option::cpuVectorize, true)}};

Option<bool>spnc::option::slpAllowDuplicateElements
{"slp-allow-duplicate-elements", false, {depends(spnc::option::cpuVectorize, true)}};

Option<bool>spnc::option::slpAllowTopologicalMixing
{"slp-allow-topological-mixing", false, {depends(spnc::option::cpuVectorize, true)}};

Option<bool>spnc::option::slpUseXorChains{"slp-use-xor-chains", false, {depends(spnc::option::cpuVectorize, true)}};

Option<bool> spnc::option::logSpace{"use-log-space", false};

Option<bool> spnc::option::gpuSharedMem{"gpu-shared-mem", true};
Expand Down
38 changes: 38 additions & 0 deletions compiler/src/option/GlobalOptions.h
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,44 @@ namespace spnc {
/// with a combination of regular vector loads and shuffles.
extern Option<bool> replaceGatherWithShuffle;

// SLP vectorization options.

///
/// Maximum number of SLP vectorization attempts.
extern Option<unsigned> slpMaxAttempts;

///
/// Maximum number of successful SLP vectorization runs to be applied to a function.
extern Option<unsigned> slpMaxSuccessfulIterations;

///
/// Maximum multinode size during SLP vectorization in terms of the number of vectors they may contain.
extern Option<unsigned> slpMaxNodeSize;

///
/// Maximum look-ahead depth when reordering multinode operands during SLP vectorization.
extern Option<unsigned> slpMaxLookAhead;

///
/// Flag to indicate the order in which SLP-vectorized instructions should be arranged.
/// True to reorder them based on a depth-first search, false to reorder them based on a breadth-first search.
extern Option<bool> slpReorderInstructionsDFS;

///
/// Flag to indicate whether duplicate elements are allowed in vectors during SLP graph building.
/// True to keep growing the graph when duplicates are encountered in a vector, false to stop growing.
extern Option<bool> slpAllowDuplicateElements;

///
/// Flag to indicate if elements with different topological depths are allowed in vectors during SLP graph building.
/// True to allow mixing, false to stop growing the graph at that vector if mixed topological depths occur.
extern Option<bool> slpAllowTopologicalMixing;

///
/// Flag to indicate if XOR chains should be used to compute look-ahead scores instead of Porpodas's algorithm.
/// True to use XOR chains, false to use Porpodas's algorithm.
extern Option<bool> slpUseXorChains;

///
/// Flag to indicate whether log-space computation should be used.
extern Option<bool> logSpace;
Expand Down
13 changes: 11 additions & 2 deletions compiler/src/option/Options.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ std::unordered_map<std::string, Opt*>& Options::options() {
return *_options;
};

std::vector<std::unique_ptr<OptModifier>>& Options::allModifiers(){
std::vector<std::unique_ptr<OptModifier>>& Options::allModifiers() {
static auto* _modifiers = new std::vector<std::unique_ptr<OptModifier>>();
return *_modifiers;
};

std::vector<OptModifier*>& Options::activeModifiers(){
std::vector<OptModifier*>& Options::activeModifiers() {
static auto* _modifiers = new std::vector<OptModifier*>();
return *_modifiers;
};
Expand All @@ -45,6 +45,15 @@ int detail::OptionParsers::parse(const std::string& value) {
return std::stoi(value);
}

/// Specialization to parse unsigned integer options,
/// using standard library facilities to parse the unsigned int from the string.
/// \param value String.
/// \return Unsigned integer value.
template<>
unsigned detail::OptionParsers::parse(const std::string& value) {
return std::stol(value);
}

/// Specialization to parse floating-point options,
/// using standard library facilities to parse the double from the string.
/// \param value String.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,17 @@
void spnc::LoSPNtoCPUConversion::initializePassPipeline(mlir::PassManager* pm, mlir::MLIRContext* ctx) {
auto* config = getContext()->get<Configuration>();
bool vectorize = spnc::option::cpuVectorize.get(*config);
pm->addPass(mlir::spn::createLoSPNtoCPUStructureConversionPass(vectorize));
pm->addPass(std::make_unique<mlir::spn::LoSPNtoCPUStructureConversionPass>(
vectorize,
spnc::option::slpMaxAttempts.get(*config),
spnc::option::slpMaxSuccessfulIterations.get(*config),
spnc::option::slpMaxNodeSize.get(*config),
spnc::option::slpMaxLookAhead.get(*config),
spnc::option::slpReorderInstructionsDFS.get(*config),
spnc::option::slpAllowDuplicateElements.get(*config),
spnc::option::slpAllowTopologicalMixing.get(*config),
spnc::option::slpUseXorChains.get(*config)
));
if (vectorize) {
auto useShuffle = spnc::option::replaceGatherWithShuffle.get(*config);
if (useShuffle) {
Expand Down Expand Up @@ -50,4 +60,4 @@ void spnc::LoSPNtoCPUConversion::initializePassPipeline(mlir::PassManager* pm, m
pm->nest<mlir::FuncOp>().addPass(mlir::createTensorBufferizePass());
pm->nest<mlir::FuncOp>().addPass(mlir::createFinalizingBufferizePass());
pm->nest<mlir::FuncOp>().addPass(mlir::createBufferDeallocationPass());
}
}
Original file line number Diff line number Diff line change
Expand Up @@ -136,7 +136,7 @@ namespace mlir {
/// \param isSum Boolean If set to true will treat operands as addends, otherwise as multiplicands.
/// \return SmallVector<ErrorEstimationValue> with a single element, representing the whole Sum or Product.
llvm::SmallVector<spn::detail::ErrorEstimationValue>
estimateErrorBinaryOperation(SmallVector<spn::detail::ErrorEstimationValue> operands, bool isSum);
estimateErrorBinaryOperation(SmallVector<spn::detail::ErrorEstimationValue> operands, bool isSum);

/// Estimate the error introduced by the given (weighted) addition operation w.r.t. the current format.
/// \param op Pointer to the defining operation, representing a SPN node.
Expand Down
2 changes: 1 addition & 1 deletion mlir/include/Conversion/HiSPNtoLoSPN/NodePatterns.h
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ namespace mlir {
};

static inline void populateHiSPNtoLoSPNNodePatterns(OwningRewritePatternList& patterns, MLIRContext* context,
TypeConverter& typeConverter) {
TypeConverter& typeConverter) {
patterns.insert<ProductNodeLowering, SumNodeLowering>(typeConverter, context);
patterns.insert<HistogramNodeLowering, CategoricalNodeLowering, GaussianNodeLowering>(typeConverter, context);
patterns.insert<RootNodeLowering>(typeConverter, context);
Expand Down
2 changes: 1 addition & 1 deletion mlir/include/Conversion/HiSPNtoLoSPN/QueryPatterns.h
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ namespace mlir {
};

static inline void populateHiSPNtoLoSPNQueryPatterns(OwningRewritePatternList& patterns, MLIRContext* context,
TypeConverter& typeConverter) {
TypeConverter& typeConverter) {
patterns.insert<JointQueryLowering>(typeConverter, context);
}

Expand Down
79 changes: 60 additions & 19 deletions mlir/include/Conversion/LoSPNtoCPU/LoSPNtoCPUConversionPasses.h
Original file line number Diff line number Diff line change
Expand Up @@ -10,51 +10,92 @@
#define SPNC_MLIR_INCLUDE_CONVERSION_LOSPNTOCPU_LOSPNTOCPUCONVERSIONPASSES_H

#include "mlir/Pass/Pass.h"
#include "mlir/Pass/PassOptions.h"
#include "LoSPNtoCPU/Vectorization/SLP/Util.h"

namespace mlir {
namespace spn {

struct LoSPNtoCPUStructureConversionPass :
public PassWrapper<LoSPNtoCPUStructureConversionPass, OperationPass<ModuleOp>> {

struct LoSPNtoCPUStructureConversionPass : public PassWrapper<LoSPNtoCPUStructureConversionPass,
OperationPass<ModuleOp>> {
public:

explicit LoSPNtoCPUStructureConversionPass(bool enableVectorization) : vectorize{enableVectorization} {}
LoSPNtoCPUStructureConversionPass() = default;
/// Constructor for accepting arguments from the driver instead of spnc-opt.
LoSPNtoCPUStructureConversionPass(bool vectorize,
unsigned maxAttempts,
unsigned maxSuccessfulIterations,
unsigned maxNodeSize,
unsigned maxLookAhead,
bool reorderInstructionsDFS,
bool allowDuplicateElements,
bool allowTopologicalMixing,
bool useXorChains) {
this->vectorize.setValue(vectorize);
this->maxAttempts.setValue(maxAttempts);
this->maxSuccessfulIterations.setValue(maxSuccessfulIterations);
this->maxNodeSize.setValue(maxNodeSize);
this->maxLookAhead.setValue(maxLookAhead);
this->reorderInstructionsDFS.setValue(reorderInstructionsDFS);
this->allowDuplicateElements.setValue(allowDuplicateElements);
this->allowTopologicalMixing.setValue(allowTopologicalMixing);
this->useXorChains.setValue(useXorChains);
}
LoSPNtoCPUStructureConversionPass(LoSPNtoCPUStructureConversionPass const& pass) : PassWrapper<
LoSPNtoCPUStructureConversionPass,
OperationPass<ModuleOp>>(pass) {}

protected:
void runOnOperation() override;

public:
void getDependentDialects(DialectRegistry& registry) const override;

private:
Option<bool> vectorize
{*this, "cpu-vectorize", llvm::cl::desc("Vectorize code generated for CPU targets"), llvm::cl::init(false)};
Option<unsigned> maxAttempts
{*this, "slp-max-attempts", llvm::cl::desc("Maximum number of SLP vectorization attempts"),
llvm::cl::init(1)};
Option<unsigned> maxSuccessfulIterations{*this, "slp-max-successful-iterations", llvm::cl::desc(
"Maximum number of successful SLP vectorization runs to be applied to a function"), llvm::cl::init(1)};
Option<unsigned> maxNodeSize{*this, "slp-max-node-size", llvm::cl::desc(
"Maximum multinode size during SLP vectorization in terms of the number of vectors they may contain"),
llvm::cl::init(10)};
Option<unsigned> maxLookAhead{*this, "slp-max-look-ahead", llvm::cl::desc(
"Maximum look-ahead depth when reordering multinode operands during SLP vectorization"), llvm::cl::init(3)};
Option<bool> reorderInstructionsDFS{*this, "slp-reorder-instructions-dfs", llvm::cl::desc(
"Flag to indicate if SLP-vectorized instructions should be arranged in DFS order (true) or in BFS order (false)"),
llvm::cl::init(true)};
Option<bool> allowDuplicateElements{*this, "slp-allow-duplicate-elements", llvm::cl::desc(
"Flag to indicate whether duplicate elements are allowed in vectors during SLP graph building"),
llvm::cl::init(false)};
Option<bool> allowTopologicalMixing{*this, "slp-allow-topological-mixing", llvm::cl::desc(
"Flag to indicate if elements with different topological depths are allowed in vectors during SLP graph building"),
llvm::cl::init(false)};
Option<bool> useXorChains{*this, "slp-use-xor-chains", llvm::cl::desc(
"Flag to indicate if XOR chains should be used to compute look-ahead scores instead of Porpodas's algorithm"),
llvm::cl::init(true)};

bool vectorize;
protected:
void runOnOperation() override;

};

std::unique_ptr<Pass> createLoSPNtoCPUStructureConversionPass(bool enableVectorization);
struct LoSPNtoCPUNodeConversionPass : public PassWrapper<LoSPNtoCPUNodeConversionPass, OperationPass<ModuleOp>> {

struct LoSPNtoCPUNodeConversionPass :
public PassWrapper<LoSPNtoCPUNodeConversionPass, OperationPass<ModuleOp>> {
public:
void getDependentDialects(DialectRegistry& registry) const override;

protected:
void runOnOperation() override;

public:
void getDependentDialects(DialectRegistry& registry) const override;

};

std::unique_ptr<Pass> createLoSPNtoCPUNodeConversionPass();

struct LoSPNNodeVectorizationPass : public PassWrapper<LoSPNNodeVectorizationPass, OperationPass<ModuleOp>> {

protected:
void runOnOperation() override;

public:
void getDependentDialects(DialectRegistry& registry) const override;

protected:
void runOnOperation() override;
};

std::unique_ptr<Pass> createLoSPNNodeVectorizationPass();
Expand Down
10 changes: 8 additions & 2 deletions mlir/include/Conversion/LoSPNtoCPU/StructurePatterns.h
Original file line number Diff line number Diff line change
Expand Up @@ -54,10 +54,16 @@ namespace mlir {
};

static inline void populateLoSPNtoCPUStructurePatterns(OwningRewritePatternList& patterns, MLIRContext* context,
TypeConverter& typeConverter) {
patterns.insert<KernelLowering, BatchTaskLowering, SingleTaskLowering>(typeConverter, context);
TypeConverter& typeConverter) {
patterns.insert<KernelLowering>(typeConverter, context);
patterns.insert<BodyLowering>(typeConverter, context);
}

static inline void populateLoSPNtoCPUTaskPatterns(OwningRewritePatternList& patterns,
MLIRContext* context,
TypeConverter& typeConverter) {
patterns.insert<BatchTaskLowering, SingleTaskLowering>(typeConverter, context, 1);
}
}
}

Expand Down
Loading