-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
REEF for NVIDIA GPUs #7
Comments
Hi @anakli, However, I have to clarify that the NVIDIA version of REEF only implemented the task preemption mechanism based on queue cleaning, and did not include all of the techniques in REEF. As such, it is not currently fully functional, and we do not plan to make it open-source or provide access to a separate repository. That being said, we will soon be open-sourcing a preemption library that we extracted from REEF-N, which works on NVIDIA GPU with CUDA. This library will assist other inference systems in implementing preemption functionality, similar to what is available in REEF-N. Once this library is available, developers will be able to implement preemption capabilities in other systems similar to what is offered by REEF-N. |
Thank you for the quick response! Do you have an expected timeline for when you plan to release the preemption library extracted from REEF-N? In the meantime, we can also prototype the approach described in Section 4.4 of the paper. I'm wondering about the following two parameters:
Thanks! |
We plan to release it within the next two months. We are currently finalizing some additional features and working on code and documentation orgnizations.
The vHQ is indeed implemented as a linked list, which means that there is no specific limitation on its size. Therefore, you can add as many kernel slots as you need.
The number of GPU kernels within the GPU runtime should be related to the workload's characteristics. There is typically a trade-off between execution latency and preemption latency. As such, we recommend keeping the number of GPU kernels within the range of 4 to 16, which is a reasonable trade-off between preemption and execution latency. |
Thank you! |
@francis0407 I would like to ask whether the Reef-N mentioned before has implemented DKP? If not, is it because it's not implementable on Nvidia?(I read through the paper and tried to implement DKP on Nvidia, but my shallow ability is not enough to judge the possibility of this solution on Nvidia, and most of the work in the paper is based on AMD graphics cards, so I have such doubts.)If DKP can be implemented on Nvidia, I will learn to implement it. If not, I'd like to know what problems you encountered during the implementation. |
I had a very rough look at the LLVM User Guide for AMDGPU and the User Guide for NVPTX Back-end. With my shallow knowledge, I guess it won't work on Nvidia GPU. |
Hi @ujay-zheng , We didn't implement DKP in REEF-N on NVIDIA GPU. It is mainly because many optimizations in DKP need to modify the binary or assembly code of the GPU kernel. For example, when "call" the candidate kernel inside the proxy kernel, we use "jump" instruction instead of "call" to avoid register spilling. Actually, DKP can be implemented on NVIDIA GPU, but with a lot engineering effort (i.e., hacking the CUDA SASS binary). |
ok i got it, thank you! |
@francis0407 How is REEF-N going? Already published?
|
Bump on this. I am interested in playing around with the device queue capacity restriction feature for nvidia gpus. @francis0407 |
Hi! |
Really interesting work :) Would it be possible to have access to the version of REEF for NVIDIA GPUs that you mention in the paper? Do you plan to make the NVIDIA GPU version open source or is it possible for researchers to get access to a separate repository with that version of REEF?
Thank you!
The text was updated successfully, but these errors were encountered: