Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Reduce block fragmentation (HabanaAI#426)
Change `NaiveBlockAllocator` to use a priority queue so that we always allocate the lowest block id first. This further increases the performance of contiguous paged attention. - [ ] Add an option or env variable to enable/disable this behavior. (Not sure if this is necessary) --------- Co-authored-by: Yang Wang <[email protected]>
- Loading branch information