CUDA pipeline for computing APR #185

krzysg · 2024-08-23T11:45:12Z

No description provided.

…imizations

…ch direction

…adding), still float number differences between CPU and GPU

…PU gives same results.

…added.

…PU gives same results.

…ated

…side)

…sions

…ccess

… PS CPU ver.

…for LinearAccessCuda

…PU computations different

…liable in future

…ine updated

cheesema · 2024-08-23T11:49:43Z

👋🏻 Hey! we will check this out. We should catch up. @joeljonsson @krzysg

krzysg · 2024-08-23T13:38:12Z

Hi, currently this is something that works and gives exactly same results as CPU implementation and in the end generates LinearAccess structure on GPU (so it does not support old random or sparse data structures).
As you may expect the most problematic was to achieve this "exactly same" level which means that all floating point computation were supposed to work exactly same on CPU and GPU (so sometimes I needed change a little bit CPU impl.).

Anyway I have added a lot of unit tests comparing CPU and GPU to make sure that all going to be same.

What is not implemented:

auto_parameters
get_apr_cuda is currently doing same steps what get_apr_cpu in APRConverter, I have not touched other places like APRConverterBatch (not sure if this is working (?)) or other parts of APRConverter like get_lrf method (not even sure what it is doing...).

Currently I will fix one test and maybe cleanup little bit CUDA stuff mostly only by moving things around to better places just to not have mess there.

cheesema · 2024-08-27T14:29:47Z

Cool thanks! I think develop is stale, and maybe we should point this at main? Then the PR diff will be a bit more rational?

cheesema · 2024-08-27T14:32:11Z

Can't wait to give this a go amazing!

LinearAccess structure on GPU (so it does not support old random or sparse data structures).

Good decision those are all that are needed / suited for the GPU anyhow.

krzysg · 2024-10-30T09:44:11Z

Cool thanks! I think develop is stale, and maybe we should point this at main?

Sure - as we talked lately I have change target branch to 'master'.

krzysg · 2024-11-05T09:36:19Z

Hi, there are two parts of LIS that probably require some explanation:

'bool boundaryReflect' additional parameter to all calc_sat_mean_*

At some point CPU impl. was change to padd/unpadd pixels before running LIS (according reflect_bc_lis parameter). At first I was trying to avoid it for GPU since I was trying to avoid additional mem allocation for padded pixels. And I managed to do that for 1D. Unfortunately it does not work for 2D or 3D cases (which is obviuos since in CPU impl. cals_sat_mean_* when run also change padded pixels so when you run that in Y-dir it has some influance on running later X-dir and so on).
Anyway I decided to leave it as a option for super fast processing of 1D data (without allocation of additional memory, double mem copy for padd and unpadd). So I decided to upgrade also CPU impl. with that 'feature'.

And now we got second part - as you know I really pay attention (no matter if this is important or not) to correct math and having same results for same data even if some differences would be caused by some minor float operations differences only (like because of different order of execution or so). So in case if I have some date put in Y-dir and when I run calc_sat_mean_y I want to have same results as when I put same data transposed (same date in X-dir) and run calc_sat_mean_x. Unfortunately those functions in CPU impl. where giving little bit different results (+/- some float precision on 6th or 7th digit after decimal point - enough to make me angry ;-) ).
Since I first managed to have proper behavior on GPU code so I have 'equalized' code on CPU side to GPU side. So the good things because of that are:
(a) code is still doing same thing that old code (+/- mention float precision) but behaves in a same way in all directions (it is already removed from code but during development I wrote tests comparing outputs of both old/new LIS to check if all is correct).
(b) naming of variables is same of GPU/CPU side so in case of any changes it is easy to fix/update both sides since they look similar.

krzysg added 30 commits August 1, 2022 14:39

Bspline filters fixed for CUDA pipeline

e6aa9c9

Debug messages turned off

b563da4

Fixed Inv Bspline in X direction (CUDA pipeline)

3db510f

Inverse Bspline pipeline for CUDA fixed

18fce44

Downsample and downsample gradient corrected to match GPU

ad5f194

GPU pipeline fixes - Full Gradient test is working now

557eff3

Merge branch 'develop' into cuda

57765a7

Merge branch 'develop' into cuda

3da13ba

GPU and CPU give same resutls in Release mode - turned off unsafe opt…

d958161

…imizations

Quick fix of processOnGpu() - not it gets correct bspline data for ea…

4ace238

…ch direction

Added new test file for LIS CUDA, GPU now handles boundary (without p…

b050e07

…adding), still float number differences between CPU and GPU

Local Intensity Scale (LIS) not works in X-dir as expected. GPU and C…

570ab20

…PU gives same results.

Local Intensity Scale (LIS) now works in Z-dir as expected. GPU and C…

17e5d8e

…PU gives same results.

Updated compareMeshes to show maximum error found

5ad9865

LIS in X-dir redesigned so code is clearer and faster. Also new test …

af1c3ac

…added.

LIS in Z-dir redesigned so code is clearer and faster. Also new test …

521d826

…added.

Local Intensity Scale (LIS) now works in Y-dir as expected. GPU and C…

b297adf

…PU gives same results.

Whole LIS pipeline is matching exactly CPU implementation + tests upd…

2cdf3fe

…ated

Quick fix of linking error

e093c01

maximum error diff. GPU vs CPU for compute gradient set to 0

053380d

rescaleAndThreshold in now only rescaling (to reflect changed in CPU …

97cf75e

…side)

rescaleAndThreshold in now only rescaling (to reflect changed in CPU …

83c2a31

…side)

constant_intensity_scale handling in LIS added for GPU

5b5a719

Removed unused threshold functions

5d0375a

FullPipeline test moved to new file

53ef94b

PixelDataDim updated with maximum dimension lenght and nuber of dimen…

ac2c22e

…sions

GradLisLevels test working now

122a96a

full pipeline tests fixed

6a5db35

Changes from old branches added + modified to GenInfo instead of APRA…

4088e9d

…ccess

Added debug printout to GenInfo

b8f2504

krzysg added 23 commits January 9, 2024 11:27

Fixed OVPC - clamping values of input levels is necessary

9f31bfd

Updated OVPC (PS) for CUDA - now it gives correct ans same results as…

2707207

… PS CPU ver.

PullingSchemeCudaTest finished, added init file for LinearAcccess test

3cb4529

Finished LinearAccess tests (for linear structure only), added draft …

027e52a

…for LinearAccessCuda

Check also total_number_particles in LinearAccess test

e83b952

LinearAccessCuda implemented (it is not used yet in CUDA pipeline)

2cc5bca

Compiler warnings fixed

e1b63d7

Removed debug outputs from LinearAccessCuda test.

4c88fae

Added two more test for full pipeline (including PS, and LinearAccess)

169cd9d

-ffast-math must be removed - some optimizations still make GPU and C…

dadf92f

…PU computations different

(nasty) fix for computeLevels in CUDA - added TODO to make it more re…

27a8dc3

…liable in future

Fix for bsplineYdir for very small input images + test for full pipel…

bb3b3f4

…ine updated

Fixed Local Intensity Scale (LIS) for super small inputs

a8c4d77

ParticleCellTreeCuda is now main stuff for CUDA

e6e4327

computeOvpcCuda now using 'stream' instead of hardcoded values

00aac97

ParticleCellTreeCuda moved and handle now cpu2gpu transfer

1fba1bc

LinearAccessCuda is now using ParticleCellTreeCuda

3474250

OVPC added to GpuTask

1d4e549

Full GPU pipeline works1

9ff0580

Some debug prints removed

c10225d

Test for full pipeline cleaned up

6b7a87d

doAll() removed from Gpu pipeline

3c601be

GPU pipeline now works for APRConverter!

d2fd1d0

krzysg changed the base branch from develop to master October 30, 2024 09:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDA pipeline for computing APR #185

CUDA pipeline for computing APR #185

krzysg commented Aug 23, 2024

cheesema commented Aug 23, 2024

krzysg commented Aug 23, 2024

cheesema commented Aug 27, 2024

cheesema commented Aug 27, 2024

krzysg commented Oct 30, 2024

krzysg commented Nov 5, 2024

CUDA pipeline for computing APR #185

Are you sure you want to change the base?

CUDA pipeline for computing APR #185

Conversation

krzysg commented Aug 23, 2024

cheesema commented Aug 23, 2024

krzysg commented Aug 23, 2024

cheesema commented Aug 27, 2024

cheesema commented Aug 27, 2024

krzysg commented Oct 30, 2024

krzysg commented Nov 5, 2024