-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ARMv8 support #32
ARMv8 support #32
Conversation
mkatliar
commented
Sep 19, 2024
•
edited
Loading
edited
- The code compiles for ARMv8 (aarch64).
- Not all of the tests pass yes.
- Added CI workflow for aarch64.
2751cd6
to
8adaf32
Compare
…amax(). Make potrf() compile on fma3<avx2> Smarter way of calculating register matrix number of columns in potrf() Make the code compile for ARM again Fixed calculation of RegisterMatrix columns number in potrf()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor comments. Please address here or in a follow-up MR
#if XSIMD_WITH_NEON64 | ||
# include <blast/math/algorithm/arch/neon64/Tile.hpp> | ||
#endif | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe use #if, #elif, #else and #error for more robust error messages.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. But on the other hand, this #if
-#elif
thingy is already in xsimd
. And if it happens that different architecture constants are defined at the same time, then you just have different specializations of tile()
declared. The architecture is a parameter of tile()
, so they will not conflict. And if you require to instantiate blast::tile()
with Arch
template parameter for which detail::tile<...>()
does not exist, the compiler should give a pretty clear message IMO.
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok 👍
@@ -46,7 +47,7 @@ namespace blast :: detail | |||
BLAST_ALWAYS_INLINE void tile(xsimd::avx2 const& arch, StorageOrder traversal_order, std::size_t m, std::size_t n, FF&& f_full, FP&& f_partial) | |||
{ | |||
size_t constexpr SS = SimdSize_v<ET>; | |||
size_t constexpr TILE_STEP = 4; // TODO: this is almost arbitrary and needs to be ppoperly determined | |||
size_t constexpr TILE_STEP = 4; // TODO: this is almost arbitrary and needs to be properly determined |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the step in 'columns' right? It depends on the architecture as well. E.g. the optimal tiles are different for AVX and AVX2, AVX512
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, true. Also depends on data type (float
or double
, and potentially std::complex<float>
and std::complex<double>
). This is a reminder to do it in a more clever way. It would be also good to avoid duplicating the tile()
function code between architectures.
/** | ||
* @brief Default size of a panel (in a panel matrix) for a given architecture and data type | ||
* | ||
* TODO: Is it always equal to SIMD size? Deprecate? | ||
* | ||
* @tparam Arch architecture | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Please deprecate
{ | ||
std::size_t constexpr registerCapacity(xsimd::avx2) | ||
{ | ||
return 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In bytes? Couldn't find the documentation on this function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, no, it's num_of_vector_registers
. Better name?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more precise, yes.