-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CUDA pipeline for computing APR #185
base: master
Are you sure you want to change the base?
Conversation
…adding), still float number differences between CPU and GPU
…PU gives same results.
…PU gives same results.
…PU gives same results.
…for LinearAccessCuda
…PU computations different
👋🏻 Hey! we will check this out. We should catch up. @joeljonsson @krzysg |
Hi, currently this is something that works and gives exactly same results as CPU implementation and in the end generates LinearAccess structure on GPU (so it does not support old random or sparse data structures). Anyway I have added a lot of unit tests comparing CPU and GPU to make sure that all going to be same. What is not implemented:
Currently I will fix one test and maybe cleanup little bit CUDA stuff mostly only by moving things around to better places just to not have mess there. |
Cool thanks! I think develop is stale, and maybe we should point this at main? Then the PR diff will be a bit more rational? |
Can't wait to give this a go amazing!
Good decision those are all that are needed / suited for the GPU anyhow. |
Sure - as we talked lately I have change target branch to 'master'. |
Hi, there are two parts of LIS that probably require some explanation:
At some point CPU impl. was change to padd/unpadd pixels before running LIS (according reflect_bc_lis parameter). At first I was trying to avoid it for GPU since I was trying to avoid additional mem allocation for padded pixels. And I managed to do that for 1D. Unfortunately it does not work for 2D or 3D cases (which is obviuos since in CPU impl. cals_sat_mean_* when run also change padded pixels so when you run that in Y-dir it has some influance on running later X-dir and so on).
|
No description provided.