Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallel #6

Open
UgoBruzadin opened this issue Jul 26, 2021 · 2 comments
Open

Parallel #6

UgoBruzadin opened this issue Jul 26, 2021 · 2 comments

Comments

@UgoBruzadin
Copy link

UgoBruzadin commented Jul 26, 2021

Hi Yun hui,

I've recently acquired a new computer and tested running cudaica in parallel. Although when compared individually cudaica runs faster than binica, cudaica really slows down when I run it in parallel (say, 12 files at a time), to the point that the CPU can perform 3 or 4x faster than the GPU.

Do you have any suggestions on how to improve performance? I know cudaica isn't using 100% of my GPU's power, so I imagine there is some headroom and I was wondering what are your thoughts. I remember reading somewhere that there was a way to modify the parameters that cudaica uses on the GPU but I can't find it. Ideally one would use the combo of the GPU and CPU to achieve 100% use of power, but I can't imagine that would be easy (right now I have it randomly chosing GPU or CPU giving CPU preference 2/3s of the time).

Side note: I'm running cudaica.exe you made on an RTX3070 and it runs is flawlessly.

@CloudyDory
Copy link
Owner

CloudyDory commented Oct 1, 2021

Hi, sorry for the late reply. Currently I have never tried to run CUDAICA in parallel, because I have seen a StackExchange Post that says running multiple CUDA applications at the same time is not actually parallel.

"At the very same slice of time, only kernels from a single CUDA context may be executed on a GPU. This may cause a GPU underutilization if kernels do not occupy the entire GPU resources (memory + compute), and some of the resources may be left unused.".

There may be ways to fix this problem, but currently I really don't have time to learn and implement.

Best,

Yunhui

@winndsd
Copy link
Contributor

winndsd commented Mar 2, 2022

The author of EEGLAB recommend a plugin named RELICA, which uses parallel pool and GPU at the same time (no CUDA).
It seems that ICA runs faster with PT than normal 'runica' algorim, using 6 cores, 80% CPU and GPU.
However, cudaica runs faster than both of them, while using 30-40% CPU and 30-60% GPU CUDA in my computer.
I believe there's still some potentials when combine PT with 'cudaica'.
Anyone can further optimize it?

RELICA can be downloaded from ‘https://github.com/sccn/relica’.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants