Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tile() function part 0.5 #8

Merged
merged 7 commits into from
Mar 8, 2024
Merged

tile() function part 0.5 #8

merged 7 commits into from
Mar 8, 2024

Conversation

mkatliar
Copy link
Owner

@mkatliar mkatliar commented Feb 1, 2024

Partial commit for #5

@mkatliar mkatliar changed the title Tile 0.5 tile() function part 0.5 Feb 1, 2024
@mkatliar mkatliar requested a review from petlist February 1, 2024 16:27
README.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@petlist petlist left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great but I lack background on the tiles and register matrices, so nothing really deep from me this time. Mostly questions out of curiosity :)

template <typename ET, StorageOrder SO, typename FF, typename FP>
BLAST_ALWAYS_INLINE void tile(std::size_t m, std::size_t n, FF&& f_full, FP&& f_partial)
{
size_t constexpr TILE_SIZE = TileSize_v<ET>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we have matrix of integers, booleans or complex numbers?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. I think it will not compile, because we don't have TileSize_v<ET> defined for those types.

{
size_t i = 0;

// i + 4 * TILE_SIZE != M is to improve performance in case when the remaining number of rows is 4 * TILE_SIZE:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a bit magic for me, can't really understand why is it more efficient :/

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where magic 3 and 4 come from?

Copy link
Owner Author

@mkatliar mkatliar Feb 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic nombers come from a specific case when you have 16 AVX registers, each storing 4 doubles, and TILE_SIZE == 4. When you have 16 rows left, it is more efficient (based on performance test) to apply a 8-row kernel 2 times than a 12-row kernel and then a 4-row kernel.

This code is tied to a specific architecture and is actually very old. It should be re-written in more general way.

// it is more efficient to apply 2 * TILE_SIZE kernel 2 times than 3 * TILE_SIZE + 1 * TILE_SIZE kernel.
for (; i + 3 * TILE_SIZE <= m && i + 4 * TILE_SIZE != m; i += 3 * TILE_SIZE)
{
RegisterMatrix<ET, 3 * TILE_SIZE, TILE_SIZE, columnMajor> ker;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why columnMajor here and not SO?

for (; i + 2 * TILE_SIZE <= m; i += 2 * TILE_SIZE)
{
RegisterMatrix<ET, 2 * TILE_SIZE, TILE_SIZE, columnMajor> ker;
f_full(ker, i, j);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Мне тут пояснительная бригада нужна, зачем мы применяем функтор к пустой матрице? А, я понял, это чтобы этими черипичками покрыть в случае четного и нечетного количества строк? Или нет?

if (i < m)
{
RegisterMatrix<ET, TILE_SIZE, TILE_SIZE, columnMajor> ker;
f_partial(ker, i, j, m - i, ker.columns());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TILE_SIZE != ker.columns() ?

@@ -51,83 +54,25 @@ namespace blast
MatrixPointer<MPC> && StorageOrder_v<MPC> == columnMajor &&
MatrixPointer<MPD> && StorageOrder_v<MPD> == columnMajor
)
BLAZE_ALWAYS_INLINE void gemm(size_t M, size_t N, size_t K, ST1 alpha, MPA A, MPB B, ST2 beta, MPC C, MPD D)
BLAST_ALWAYS_INLINE void gemm(size_t M, size_t N, size_t K, ST1 alpha, MPA A, MPB B, ST2 beta, MPC C, MPD D)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question, do we need to specify M, N, K? Are not they part of MPA, MPB, MPC?

{
using ET = std::remove_cv_t<ElementType_t<MPA>>;
using ET = std::remove_cv_t<ElementType_t<MPD>>;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make a difference?

{
using ET = std::remove_cv_t<ElementType_t<MPA>>;
using ET = std::remove_cv_t<ElementType_t<MPD>>;
size_t constexpr TILE_SIZE = TileSize_v<ET>;

BLAZE_CONSTRAINT_MUST_BE_SAME_TYPE(std::remove_cv_t<ElementType_t<MPB>>, ET);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it make sense to put in require above?

@mkatliar
Copy link
Owner Author

Continuous integration broken because of a bug in Blaze: https://bitbucket.org/blaze-lib/blaze/commits/6058199308a8c741279373b4d81debbba5e5e850#comment-14858242

@mkatliar
Copy link
Owner Author

mkatliar commented Mar 8, 2024

@mkatliar mkatliar merged commit 153025e into master Mar 8, 2024
1 check passed
@mkatliar mkatliar deleted the tile-0.5 branch March 8, 2024 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants