-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why CUDA Performance Falls Short of CPU in LightGBM: Training and Inference Analysis #6697
Comments
Thanks for using LightGBM. LightGBM does not currently have GPU-accelerated inference. You can see #5854 (comment) for some other options to try using GPUs to generate predictions with a LightGBM model.
What does "comprehensive explanation" mean to you? Is there another library that has something like what you're looking for, and if so can you link to that?
If that is true, you should try reducing your benchmarking code to just |
Two more points on claims like this:
|
@jameslamb , Thank you very much for your detailed response. I have gained valuable insights from your explanation:
Thank you once again for your time and expertise. |
There is just far more work to be done in this repo than people around to do it. @shiyu1994 has done most of the CUDA development in this project, maybe he can explain why training was a higher priority. I have some ideas about this but I'm not confident in them, and I don't want to misinform you. |
Description
During the process of conducting source code reading and testing on LightGBM using a binary classifier, it was observed that the GPU performance during training is notably lower than that of the CPU, specifically amounting to approximately one-tenth of the CPU performance. Moreover, during inference, the option to utilize any backend other than the CPU is not accessible. The GPU performance pertains to the device=
cuda
, which employs CudaTree, whereas the CPU backend refers to device=cpu
, which utilizesTree
or its derivatives. The following questions arise:Has this phenomenon been observed by others? If so, why is the CUDA backend (
CUDATree
) not employed during inference? Is it due to the operator characteristics being more advantageous for the CPU?In which inference case would the CUDA backend definitely surpass the CPU backend?
Is there any available documentation that offers a comprehensive explanation of CUDA acceleration for LightGBM?
infer
train
onnx_runtime_c++(infer)
Reproducible example
Environment info
Command(s) I used to install LightGBM
Additional Comments
I am particularly interested in understanding the performance trade-offs between CPU and GPU backends in both training and inference stages. Any insights or documentation on this topic would be greatly appreciated.
The text was updated successfully, but these errors were encountered: