Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

REEF for NVIDIA GPUs #7

Open
anakli opened this issue Apr 15, 2023 · 11 comments
Open

REEF for NVIDIA GPUs #7

anakli opened this issue Apr 15, 2023 · 11 comments

Comments

@anakli
Copy link

anakli commented Apr 15, 2023

Really interesting work :) Would it be possible to have access to the version of REEF for NVIDIA GPUs that you mention in the paper? Do you plan to make the NVIDIA GPU version open source or is it possible for researchers to get access to a separate repository with that version of REEF?

Thank you!

@francis0407
Copy link
Contributor

Hi @anakli,
Thank you for your interest in our work.

However, I have to clarify that the NVIDIA version of REEF only implemented the task preemption mechanism based on queue cleaning, and did not include all of the techniques in REEF. As such, it is not currently fully functional, and we do not plan to make it open-source or provide access to a separate repository.

That being said, we will soon be open-sourcing a preemption library that we extracted from REEF-N, which works on NVIDIA GPU with CUDA. This library will assist other inference systems in implementing preemption functionality, similar to what is available in REEF-N. Once this library is available, developers will be able to implement preemption capabilities in other systems similar to what is offered by REEF-N.

@anakli
Copy link
Author

anakli commented Apr 16, 2023

Thank you for the quick response!

Do you have an expected timeline for when you plan to release the preemption library extracted from REEF-N?

In the meantime, we can also prototype the approach described in Section 4.4 of the paper. I'm wondering about the following two parameters:

  • what size do you assume for the vHQ (how many kernel slots?)
  • what is the "fixed number" of kernels you submit at a time from the vHQ to the GPU runtime?

Thanks!

@francis0407
Copy link
Contributor

Do you have an expected timeline for when you plan to release the preemption library extracted from REEF-N?
Maybe

We plan to release it within the next two months. We are currently finalizing some additional features and working on code and documentation orgnizations.

In the meantime, we can also prototype the approach described in Section 4.4 of the paper. I'm wondering about the following two parameters:

  • what size do you assume for the vHQ (how many kernel slots?)

The vHQ is indeed implemented as a linked list, which means that there is no specific limitation on its size. Therefore, you can add as many kernel slots as you need.

  • what is the "fixed number" of kernels you submit at a time from the vHQ to the GPU runtime?

The number of GPU kernels within the GPU runtime should be related to the workload's characteristics. There is typically a trade-off between execution latency and preemption latency. As such, we recommend keeping the number of GPU kernels within the range of 4 to 16, which is a reasonable trade-off between preemption and execution latency.

@anakli
Copy link
Author

anakli commented Apr 19, 2023

Thank you!

@ujay-zheng
Copy link

@francis0407 I would like to ask whether the Reef-N mentioned before has implemented DKP? If not, is it because it's not implementable on Nvidia?(I read through the paper and tried to implement DKP on Nvidia, but my shallow ability is not enough to judge the possibility of this solution on Nvidia, and most of the work in the paper is based on AMD graphics cards, so I have such doubts.)If DKP can be implemented on Nvidia, I will learn to implement it. If not, I'd like to know what problems you encountered during the implementation.

@ujay-zheng
Copy link

I had a very rough look at the LLVM User Guide for AMDGPU and the User Guide for NVPTX Back-end. With my shallow knowledge, I guess it won't work on Nvidia GPU.

@francis0407
Copy link
Contributor

Hi @ujay-zheng ,

We didn't implement DKP in REEF-N on NVIDIA GPU. It is mainly because many optimizations in DKP need to modify the binary or assembly code of the GPU kernel. For example, when "call" the candidate kernel inside the proxy kernel, we use "jump" instruction instead of "call" to avoid register spilling.

Actually, DKP can be implemented on NVIDIA GPU, but with a lot engineering effort (i.e., hacking the CUDA SASS binary).

@ujay-zheng
Copy link

ok i got it, thank you!

@pokerfaceSad
Copy link

@francis0407 How is REEF-N going? Already published?

Hi @anakli, Thank you for your interest in our work.

However, I have to clarify that the NVIDIA version of REEF only implemented the task preemption mechanism based on queue cleaning, and did not include all of the techniques in REEF. As such, it is not currently fully functional, and we do not plan to make it open-source or provide access to a separate repository.

That being said, we will soon be open-sourcing a preemption library that we extracted from REEF-N, which works on NVIDIA GPU with CUDA. This library will assist other inference systems in implementing preemption functionality, similar to what is available in REEF-N. Once this library is available, developers will be able to implement preemption capabilities in other systems similar to what is offered by REEF-N.

@atomicapple0
Copy link

Bump on this. I am interested in playing around with the device queue capacity restriction feature for nvidia gpus. @francis0407

@Alex4210987
Copy link

Hi!
It's very interesting work, and I wonder if it can be run on nvida gpus or amd gpus other than AMD RADEON
INSTINCT™ MI50 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants