-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
FA Kernel Update for Accuracy and Performance (#45)
Major changes: 1. Fix numerical errors from scaling input tensors with log_2(e) as preprocessing. Fudge factors are adjusted accordingly 2. Adopt techniques from forward kernel to specialize inner loops of the bwd kernel as well. 3. Update the tuning database for MI200/300 accordingly Minor changes: 1. `pyaotriton` now includes `$ORIGIN` in its `DT_RUNPATH` 2. `install` target now installs `pyaotriton` to `$CMAKE_INSTALL_PREFIX/lib` 3. `mptune` now stores testing results' batch size, making the timing results more informative 4. `performance_*.py` scripts now read `USE_TFLOPS`, `D_HEADS`, and `N_CTX` env vars, allowing changing the testing size without editing the code 5. `test/test_backward.py` now displays target fudge factors for fudge factor adjustment 6. `tune_flash.py` now shrinks batch size to 2 when both sequence lengths > 4096, to not exceed the VRAM limit. 7. Fix a problem of `sancheck_lut_tensor` in `class FlashKernel(KernelDescription)`, which did not handle single element LUT tensor correctly. 8. `v2python/table_tool.py` now ignores `inputs$BATCH` column Notes: 1. The fudge factors in use assume PyTorch <= 2.4. See pytorch/pytorch#135590 for detailed discussion why PyTorch 2.5 cannot be used for testing. PyTorch 2.6 will include a new interface to fix the problem.
- Loading branch information
1 parent
e43acd9
commit f6b28a9
Showing
19 changed files
with
469 additions
and
420 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
Oops, something went wrong.