You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As GPUs get increasingly more compute power in lower precision (down to four bits!) compared to double precision, we have to consider how to make efficient use of future machines with pySDC.
Reducing the precision in the implementation is simple, but the questions are:
Where can we reduce the floating point precision with minimal impact on the accuracy of the solution?
Where can we actually gain something by reduced floating point precision?
Note that the cost of individual arithmetic operations is limited from below by kernel launch cost, see for instance the discussion in this article. Therefore, I doubt that we win much by simply switching everything to lower precision.
A good starting point would be to choose a problem with an iterative linear solver that is launched as a single kernel. Then, we can start by doing only the implicit solves in single precision, or half precision, or four bit precision, and see what we gain. Possibly, we have to increase the precision between SDC iterations and probably we have to choose quite large problem resolution to see the difference.
The scope of this project appears well suited to a bachelor thesis, an internship, or the likes. If you or anyone you know would find this interesting, please get in touch!
There is no need for a deep understanding of SDC or Python. Basic proficiency with the latter and a low level of fear of maths is sufficient. This would be a nice opportunity to get to know GPU programming in Python.
The text was updated successfully, but these errors were encountered:
Can we also have the same consideration with CPU ? My guess is that low precision computation is not very fast only with GPU, and considering inexact SDC that would make a lot of sense indeed.
Other question : how easy is it to switch from single to double precision on the GPU during computation ? Is it too expensive so it requires to do low precision computation on one GPU, and high order computation on another one ? ...
Can we also have the same consideration with CPU ? My guess is that low precision computation is not very fast only with GPU, and considering inexact SDC that would make a lot of sense indeed.
I am not sure how much faster CPUs are with low precision, if there is a significant gain, sure we can try. On GPUs, the single precision speed is often quoted as twice the flops you can get with double precision.
Other question : how easy is it to switch from single to double precision on the GPU during computation ? Is it too expensive so it requires to do low precision computation on one GPU, and high order computation on another one ? ...
I don't know how expensive the casting is. I would be surprised if it was more expensive than communicating between distinct GPUs and I cannot think of a usecase for distributing like this off the top of my head.
As GPUs get increasingly more compute power in lower precision (down to four bits!) compared to double precision, we have to consider how to make efficient use of future machines with pySDC.
Reducing the precision in the implementation is simple, but the questions are:
Note that the cost of individual arithmetic operations is limited from below by kernel launch cost, see for instance the discussion in this article. Therefore, I doubt that we win much by simply switching everything to lower precision.
A good starting point would be to choose a problem with an iterative linear solver that is launched as a single kernel. Then, we can start by doing only the implicit solves in single precision, or half precision, or four bit precision, and see what we gain. Possibly, we have to increase the precision between SDC iterations and probably we have to choose quite large problem resolution to see the difference.
The scope of this project appears well suited to a bachelor thesis, an internship, or the likes. If you or anyone you know would find this interesting, please get in touch!
There is no need for a deep understanding of SDC or Python. Basic proficiency with the latter and a low level of fear of maths is sufficient. This would be a nice opportunity to get to know GPU programming in Python.
The text was updated successfully, but these errors were encountered: