Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PyCudaHandler: Illegal memory access (cuMemcpyDtoD failed) #84

Open
Qwlouse opened this issue Nov 2, 2015 · 1 comment
Open

PyCudaHandler: Illegal memory access (cuMemcpyDtoD failed) #84

Qwlouse opened this issue Nov 2, 2015 · 1 comment
Labels

Comments

@Qwlouse
Copy link
Collaborator

Qwlouse commented Nov 2, 2015

I encountered a weird crash. Not really sure what happened, because it ran fine for 29 epochs before that. I can't reproduce it, but I wanted to keep it here for future reference.

 - - - - - - - - - - - -  Epoch 30  - - - - - - - - - - - -
[====1====2====3=validation
  total_loss                            : 122.73372146606445
  Accuracy                              : 0.6446704
===4===PyCUDA WARNING: a clean-up operation failed (dead context maybe?)

cuMemFree failed: an illegal memory access was encountered
ERROR - diag_hutter - Failed after 2 days, 7:11:11!
/home/greff/Programming/sacred/sacred/observers/mongo.py:110: DeprecationWarning: save is deprecated. Use insert_one or replace_one instead
  self.runs.save(self.run_entry)
Traceback (most recent calls WITHOUT Sacred internals):
  File "diag_hutter.py", line 87, in run
    trainer.train(network, getter_tr, valid_getter=getter_va)
  File "/home/greff/Programming/brainstorm/brainstorm/training/trainer.py", line 99, in train
    self.stepper.run()
  File "/home/greff/Programming/brainstorm/brainstorm/training/steppers.py", line 103, in run
    self.net.forward_pass(training_pass=True)
  File "/home/greff/Programming/brainstorm/brainstorm/structure/network.py", line 431, in forward_pass
    layer.forward_pass(self.buffer[layer_name], training_pass)
  File "/home/greff/Programming/brainstorm/brainstorm/layers/loss_layer.py", line 46, in forward_pass
    buffers.outputs.loss.reshape(tuple()))
  File "/home/greff/Programming/brainstorm/brainstorm/handlers/pycuda_handler.py", line 378, in sum_t
    self.copy_to(cumisc.sum(a), out)
  File "/home/greff/Programming/brainstorm/brainstorm/handlers/pycuda_handler.py", line 84, in copy_to
    pycuda.driver.memcpy_dtod(dest.gpudata, src.gpudata, dest.nbytes)
Traceback (most recent call last):
  File "/home/greff/Programming/sacred/sacred/experiment.py", line 165, in run_commandline
    args)
  File "/home/greff/Programming/sacred/sacred/experiment.py", line 138, in run_command
    run()
  File "/home/greff/Programming/sacred/sacred/run.py", line 144, in __call__
    self.result = self.main_function(*args)
  File "/home/greff/Programming/sacred/sacred/config/captured_function.py", line 47, in captured_function
    result = wrapped(*args, **kwargs)
  File "diag_hutter.py", line 87, in run
    trainer.train(network, getter_tr, valid_getter=getter_va)
  File "/home/greff/Programming/brainstorm/brainstorm/training/trainer.py", line 99, in train
    self.stepper.run()
  File "/home/greff/Programming/brainstorm/brainstorm/training/steppers.py", line 103, in run
    self.net.forward_pass(training_pass=True)
  File "/home/greff/Programming/brainstorm/brainstorm/structure/network.py", line 431, in forward_pass
    layer.forward_pass(self.buffer[layer_name], training_pass)
  File "/home/greff/Programming/brainstorm/brainstorm/layers/loss_layer.py", line 46, in forward_pass
    buffers.outputs.loss.reshape(tuple()))
  File "/home/greff/Programming/brainstorm/brainstorm/handlers/pycuda_handler.py", line 378, in sum_t
    self.copy_to(cumisc.sum(a), out)
  File "/home/greff/Programming/brainstorm/brainstorm/handlers/pycuda_handler.py", line 84, in copy_to
    pycuda.driver.memcpy_dtod(dest.gpudata, src.gpudata, dest.nbytes)
pycuda._driver.LogicError: cuMemcpyDtoD failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuMemFree failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
PyCUDA WARNING: a clean-up operation failed (dead context maybe?)
cuModuleUnload failed: an illegal memory access was encountered
sys:1: ResourceWarning: unclosed <socket.socket fd=23, family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('127.0.0.1', 35871), raddr=('127.0.0.1', 5006)>
/home/greff/venv/py3/lib/python3.4/importlib/_bootstrap.py:2150: ImportWarning: sys.meta_path is empty
@flukeskywalker
Copy link
Collaborator

Our usage of skcuda.sum() was rather weird, due to a bug in scikit-cuda. The fix above will hopefully avoid such crashes. Let's keep this open for a while to see if the bug re-appears.

@flukeskywalker flukeskywalker changed the title PyCudaHandler: Illegal memory access PyCudaHandler: Illegal memory access (cuMemcpyDtoD failed) Nov 23, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants