-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Spurious failures in cudacopy! with "invalid argument" error #18
Comments
I can't replicate this: I ran
Things to try:
|
Before the failure it looks like this (I printed cuda_ptrs and the copy parameters in cudacopy!). I find it reliably reproducible, but it disappears when I use copy! instead of to_host.
|
If I surround to_host in gc_disable, gc_enable, there is also no crash, so this probably isn't to do with finalizers. |
Very strange. I get this:
because this code should remove all the entries from the dict. Why isn't that happening in your case? |
The message gets printed before the error in to_host, so that's cuda_ptrs just before the crash. |
Ah, I misunderstood when you were running it. That's informative too (but also worth checking: you should have an empty dict right after you successfully complete I can't see anything that looks wrong with that output. I'm pretty baffled overall. Does |
I should point out that I don't have direct control over the machine here, it's more like a departmental server that I can use. I notice the runtime version is 6.0, while the wrappers were generated for 6.5. I think this is probably not the reason because the actual API is probably just the same. At first I thought maybe it's some pointer alignment issue (the host pointer is not divisible by 128 = 0x80), but the other host pointers are also not aligned and do not cause errors. |
In that example, the device is still initialized (i.e. cudaDeviceReset was not called, which is a real problem):
|
This is probably related to #17 and how finalizers work.
The following function:
produces the following error on the second time it is run. If I run
gc()
in between the two runs, there is no error.This is with the git head version of CUDArt, and probably has something to do with a garbage collection pass trying to collect a Cuda pointer that came from a previous device context (before device_reset called cudaDeviceReset), so that the pointer is invalid in the new device context.
This is very irritating when testing Cuda code in the repl when the same function is run over and over again, sometimes not even correctly, so resetting everything correctly is a must.
The text was updated successfully, but these errors were encountered: