You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It used to work on my laptop, but no longer. I fear it is due to some interaction with GPU being used as an actual graphics card as well, and thus Xorg consuming too much memory (but requested ~1.3GB is less than available free ~2GB) or something like that
nvidia-smi
$> nvidia-smi
Mon Nov 11 09:55:21 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 430.50 Driver Version: 430.50 CUDA Version: 10.1 ||-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC || Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. ||===============================+======================+======================|| 0 Quadro T2000 Off | 00000000:01:00.0 Off | N/A || N/A 43C P8 3W / N/A | 2297MiB / 3911MiB | 19% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory || GPU PID Type Process name Usage ||=============================================================================|| 0 21824 G /usr/lib/xorg/Xorg 141MiB || 0 25467 G /usr/lib/xorg/Xorg 1670MiB || 0 25596 G /usr/bin/gnome-shell 180MiB || 0 27333 G ...uest-channel-token=14439694130078186709 232MiB || 0 28802 G /usr/lib/xorg/Xorg 6MiB || 0 28899 G /usr/bin/gnome-shell 5MiB |
+-----------------------------------------------------------------------------+
the actual run via singularity
$> singularity run -e -B /usr/lib/x86_64-linux-gnu/libnvidia-fatbinaryloader.so.430.50 -B /usr/lib/x86_64-linux-gnu/libcuda.so.1 neuronets-kwyk--version-0.4-gpu.sing raiders/sub-rid000005/anat/sub-rid000005_run-01_T1w.nii.gz out
Bayesian dropout functions have been loaded.
Your version: v0.4 Latest version: 0.4
++ Conforming volume to 1mm^3 voxels and size 256x256x256.
/opt/kwyk/freesurfer/bin/mri_convert: line 2: /opt/kwyk/freesurfer/sources.sh: No such file or directory
mri_convert.bin --conform raiders/sub-rid000005/anat/sub-rid000005_run-01_T1w.nii.gz /tmp/tmpwtickiw9.nii.gz
$Id: mri_convert.c,v 1.226 2016/02/26 16:15:24 mreuter Exp $
reading from raiders/sub-rid000005/anat/sub-rid000005_run-01_T1w.nii.gz...
TR=10.00, TE=0.00, TI=0.00, flip angle=0.00
i_ras = (0, -1, 0)
j_ras = (0, 0, 1)
k_ras = (1, 0, 0)
changing data type from float to uchar (noscale = 0)...
MRIchangeType: Building histogram
Reslicing using trilinear interpolation
writing to /tmp/tmpwtickiw9.nii.gz...
++ Running forward pass of model.
2019-11-11 14:57:43.820728: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 FMA
2019-11-11 14:57:43.916219: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:964] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero
2019-11-11 14:57:43.916394: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1432] Found device 0 with properties:
name: Quadro T2000 major: 7 minor: 5 memoryClockRate(GHz): 1.5
pciBusID: 0000:01:00.0
totalMemory: 3.82GiB freeMemory: 1.41GiB
2019-11-11 14:57:43.916409: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1511] Adding visible gpu devices: 0
2019-11-11 14:57:44.267550: I tensorflow/core/common_runtime/gpu/gpu_device.cc:982] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-11-11 14:57:44.267570: I tensorflow/core/common_runtime/gpu/gpu_device.cc:988] 0
2019-11-11 14:57:44.267575: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1001] 0: N
2019-11-11 14:57:44.267684: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 1246 MB memory) -> physical GPU (device: 0, name: Quadro T2000, pci bus id: 0000:01:00.0, compute capability: 7.5)
Normalizer being used <function zscore at 0x7fe98eac4ea0>
-5.8382284e-08
1.0000015
0/64 [..............................] - ETA: 0s2019-11-11 14:57:46.303925: E tensorflow/stream_executor/cuda/cuda_dnn.cc:373] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
2019-11-11 14:57:46.314172: E tensorflow/stream_executor/cuda/cuda_dnn.cc:373] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR
Traceback (most recent call last):
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1334, in _do_call
return fn(*args)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1319, in _run_fn
options, feed_dict, fetch_list, target_list, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1407, in _call_tf_sessionrun
run_metadata)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[{{node layer_1/conv3d/Conv3D}} = Conv3D[T=DT_FLOAT, data_format="NDHWC", dilations=[1, 1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1, 1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_85, layer_1/conv3d/kernel_m)]]
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/usr/local/bin/kwyk", line 11, in<module>
load_entry_point('kwyk', 'console_scripts', 'kwyk')()
File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/opt/kwyk/kwyk/cli.py", line 92, in predict
normalizer=zscore)
File "/usr/local/lib/python3.5/dist-packages/nobrainer/predict.py", line 348, in predict_from_filepath
batch_size=batch_size)
File "/usr/local/lib/python3.5/dist-packages/nobrainer/predict.py", line 275, in predict_from_img
batch_size=batch_size)
File "/usr/local/lib/python3.5/dist-packages/nobrainer/predict.py", line 186, in predict_from_array
new_prediction = predictor( {'volume': features[j:j + batch_size]})
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/predictor/predictor.py", line 77, in __call__
return self._session.run(fetches=self.fetch_tensors, feed_dict=feed_dict)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 929, in run
run_metadata_ptr)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1152, in _run
feed_dict_tensor, options, run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1328, in _do_run
run_metadata)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/client/session.py", line 1348, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node layer_1/conv3d/Conv3D (defined at /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/predictor/saved_model_predictor.py:153) = Conv3D[T=DT_FLOAT, data_format="NDHWC", dilations=[1, 1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1, 1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_85, layer_1/conv3d/kernel_m)]]
Caused by op 'layer_1/conv3d/Conv3D', defined at:
File "/usr/local/bin/kwyk", line 11, in<module>
load_entry_point('kwyk', 'console_scripts', 'kwyk')()
File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 764, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 717, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 956, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.5/dist-packages/click/core.py", line 555, in invoke
return callback(*args, **kwargs)
File "/opt/kwyk/kwyk/cli.py", line 83, in predict
predictor = _get_predictor(savedmodel_path)
File "/usr/local/lib/python3.5/dist-packages/nobrainer/predict.py", line 406, in _get_predictor
predictor = tf.contrib.predictor.from_saved_model(str(path))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/predictor/predictor_factories.py", line 153, in from_saved_model
config=config)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/contrib/predictor/saved_model_predictor.py", line 153, in __init__
loader.load(self._session, tags.split(','), export_dir)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 197, in load
return loader.load(sess, tags, import_scope, **saver_kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 350, in load
**saver_kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/saved_model/loader_impl.py", line 278, in load_graph
meta_graph_def, import_scope=import_scope, **saver_kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/training/saver.py", line 1696, in _import_meta_graph_with_return_elements
**kwargs))
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/meta_graph.py", line 806, in import_scoped_meta_graph_with_return_elements
return_elements=return_elements)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/util/deprecation.py", line 488, in new_func
return func(*args, **kwargs)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 442, in import_graph_def
_ProcessNewOps(graph)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/importer.py", line 234, in _ProcessNewOps
fornew_opin graph._add_new_tf_operations(compute_devices=False): # pylint: disable=protected-access
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3440, in _add_new_tf_operations
forc_opin c_api_util.new_tf_operations(self)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3440, in<listcomp>forc_opin c_api_util.new_tf_operations(self)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 3299, in _create_op_from_tf_operation
ret = Operation(c_op, self)
File "/usr/local/lib/python3.5/dist-packages/tensorflow/python/framework/ops.py", line 1770, in __init__
self._traceback = tf_stack.extract_stack()
UnknownError (see above for traceback): Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
[[node layer_1/conv3d/Conv3D (defined at /usr/local/lib/python3.5/dist-packages/tensorflow/contrib/predictor/saved_model_predictor.py:153) = Conv3D[T=DT_FLOAT, data_format="NDHWC", dilations=[1, 1, 1, 1, 1], padding="SAME", strides=[1, 1, 1, 1, 1], _device="/job:localhost/replica:0/task:0/device:GPU:0"](_arg_Placeholder_0_0/_85, layer_1/conv3d/kernel_m)]]
The text was updated successfully, but these errors were encountered:
It used to work on my laptop, but no longer. I fear it is due to some interaction with GPU being used as an actual graphics card as well, and thus Xorg consuming too much memory (but requested ~1.3GB is less than available free ~2GB) or something like that
nvidia-smi
the actual run via singularity
The text was updated successfully, but these errors were encountered: