Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Questions about warp group memory layout #77

Open
Devil-SX opened this issue Jan 1, 2025 · 0 comments
Open

Questions about warp group memory layout #77

Devil-SX opened this issue Jan 1, 2025 · 0 comments

Comments

@Devil-SX
Copy link

Devil-SX commented Jan 1, 2025

I really like this work. However, I'm having some difficulty understanding the shared memory layouts described in Appendix C.

Why does the Naive Swizzled Layout lack hardware support for HGMMA and UTMA instructions, while 32-byte and 64-byte swizzling configurations do support them? What is the memory layout satisfied warp group instructions?

Here are my understanding:

  1. ThunderKittens memory format abstraction is designed to support both row-major and column-major 16x16 tiles, allowing programmers to avoid focusing on the specific layouts of matrices A or B.

  2. Shared Memory comprises 32 banks, each capable of providing 32-bit bandwidth per cycle. If a row or column within the 16x16 tiles accesses more than two 16bit elements in the same bank, it leads to bank conflicts.

  3. 64x32 16-bit Example in Appendix C corresponds to the maximum data required for a warp group, which includes:

    • 4 warps, 1 Tensor Core per warp
    • 2 16x16 input matrices per Tensor Core
    • 8 total 16x16 input matrices -> 64x32 16 bit tile
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant