Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARMv8 support #32

Merged
merged 5 commits into from
Sep 20, 2024
Merged

ARMv8 support #32

merged 5 commits into from
Sep 20, 2024

Conversation

mkatliar
Copy link
Owner

@mkatliar mkatliar commented Sep 19, 2024

  • The code compiles for ARMv8 (aarch64).
  • Not all of the tests pass yes.
  • Added CI workflow for aarch64.

@mkatliar mkatliar force-pushed the armv8 branch 4 times, most recently from 2751cd6 to 8adaf32 Compare September 20, 2024 10:22
mkatliar and others added 5 commits September 20, 2024 12:38
…amax().

Make potrf() compile on fma3<avx2>

Smarter way of calculating register matrix number of columns in potrf()

Make the code compile for ARM again

Fixed calculation of RegisterMatrix columns number in potrf()
@mkatliar mkatliar changed the title [WIP] ARMv8 support ARMv8 support Sep 20, 2024
@mkatliar mkatliar requested a review from roversch September 20, 2024 10:44
Copy link
Collaborator

@roversch roversch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments. Please address here or in a follow-up MR

Comment on lines +11 to +14
#if XSIMD_WITH_NEON64
# include <blast/math/algorithm/arch/neon64/Tile.hpp>
#endif

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe use #if, #elif, #else and #error for more robust error messages.

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. But on the other hand, this #if-#elif thingy is already in xsimd. And if it happens that different architecture constants are defined at the same time, then you just have different specializations of tile() declared. The architecture is a parameter of tile(), so they will not conflict. And if you require to instantiate blast::tile() with Arch template parameter for which detail::tile<...>() does not exist, the compiler should give a pretty clear message IMO.

What do you think?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok 👍

@@ -46,7 +47,7 @@ namespace blast :: detail
BLAST_ALWAYS_INLINE void tile(xsimd::avx2 const& arch, StorageOrder traversal_order, std::size_t m, std::size_t n, FF&& f_full, FP&& f_partial)
{
size_t constexpr SS = SimdSize_v<ET>;
size_t constexpr TILE_STEP = 4; // TODO: this is almost arbitrary and needs to be ppoperly determined
size_t constexpr TILE_STEP = 4; // TODO: this is almost arbitrary and needs to be properly determined
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the step in 'columns' right? It depends on the architecture as well. E.g. the optimal tiles are different for AVX and AVX2, AVX512

Copy link
Owner Author

@mkatliar mkatliar Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, true. Also depends on data type (float or double, and potentially std::complex<float> and std::complex<double>). This is a reminder to do it in a more clever way. It would be also good to avoid duplicating the tile() function code between architectures.

Comment on lines +17 to +23
/**
* @brief Default size of a panel (in a panel matrix) for a given architecture and data type
*
* TODO: Is it always equal to SIMD size? Deprecate?
*
* @tparam Arch architecture
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Please deprecate

{
std::size_t constexpr registerCapacity(xsimd::avx2)
{
return 16;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In bytes? Couldn't find the documentation on this function

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, no, it's num_of_vector_registers. Better name?

Copy link
Owner Author

@mkatliar mkatliar Sep 20, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is more precise, yes.

This was referenced Sep 20, 2024
@mkatliar mkatliar merged commit d174478 into master Sep 20, 2024
2 checks passed
@mkatliar mkatliar deleted the armv8 branch September 24, 2024 13:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants