-
Notifications
You must be signed in to change notification settings - Fork 326
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to load any of {libcudnn_ops.so.9.1.0, libcudnn_ops.so.9.1, libcudnn_ops.so.9, libcudnn_ops.so} Invalid handle. Cannot load symbol cudnnCreateTensorDescriptor #259
Comments
Please state what versions of torch and ctranslate2 are you using |
same issue here torch version
ctranslate2 version
|
if you are using colab, downgrade to 4.4.0 |
This is because you didn't add the libcudnn_ops.so.9.1.0 to your path, you can use command below
|
It was working perfectly, but suddenly this error appeared on Tuesday. Btw I am using it on runpod instance. |
Same as me, by the way,</path/to/you> is in your anaconda python environment, you can use
find it, where it is |
thanks @roboatLee @MahmoudAshraf97 for help. I found the solution. |
When running the code block below, I am getting the error # Initialize NeMo MSDD diarization model
msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to("cuda")
msdd_model.diarize()
del msdd_model
torch.cuda.empty_cache() Here are my python library versions ctranslate2==4.5.0
torch==2.5.0
CUDA 12.4
nvidia-cuda-cupti-cu12 12.4.127
nvidia-cuda-nvrtc-cu12 12.4.127
nvidia-cuda-runtime-cu12 12.4.127 This is the libcudnn_ops location root@53a5b6645406:/container/work/whisper-diarization2/whisper-diarization# find / |grep libcudnn_ops
/usr/local/lib/python3.10/dist-packages/nvidia/cudnn/lib/libcudnn_ops.so.9
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9.5.1
/usr/lib/x86_64-linux-gnu/libcudnn_ops.so.9 I ran I added the export path Here is the full output ---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
Cell In[15], line 2
1 # Initialize NeMo MSDD diarization model
----> 2 msdd_model = NeuralDiarizer(cfg=create_config(temp_path)).to("cuda")
3 msdd_model.diarize()
5 del msdd_model
File [/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/models/msdd_models.py:994](http://localhost:8888/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/models/msdd_models.py#line=993), in NeuralDiarizer.__init__(self, cfg)
989 self.max_pred_length = cfg.diarizer.msdd_model.parameters.get('max_pred_length', 0)
990 self.diar_eval_settings = cfg.diarizer.msdd_model.parameters.get(
991 'diar_eval_settings', [(0.25, True), (0.25, False), (0.0, False)]
992 )
--> 994 self._init_msdd_model(cfg)
995 self.diar_window_length = cfg.diarizer.msdd_model.parameters.diar_window_length
996 self.msdd_model.cfg = self.transfer_diar_params_to_model_params(self.msdd_model, cfg)
File [/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/models/msdd_models.py:1096](http://localhost:8888/usr/local/lib/python3.10/dist-packages/nemo/collections/asr/models/msdd_models.py#line=1095), in NeuralDiarizer._init_msdd_model(self, cfg)
1094 logging.warning(f"requested {model_path} model name not available in pretrained models, instead")
1095 logging.info("Loading pretrained {} model from NGC".format(model_path))
-> 1096 self.msdd_model = EncDecDiarLabelModel.from_pretrained(model_name=model_path, map_location=cfg.device)
1097 # Load speaker embedding model state_dict which is loaded from the MSDD checkpoint.
1098 if self.use_speaker_model_from_ckpt:
File [/usr/local/lib/python3.10/dist-packages/nemo/core/classes/common.py:754](http://localhost:8888/usr/local/lib/python3.10/dist-packages/nemo/core/classes/common.py#line=753), in Model.from_pretrained(cls, model_name, refresh_cache, override_config_path, map_location, strict, return_config, trainer, save_restore_connector)
748 else:
749 # NGC source
750 class_, nemo_model_file_in_cache = cls._get_ngc_pretrained_model_info(
751 model_name=model_name, refresh_cache=refresh_cache
752 )
--> 754 instance = class_.restore_from(
755 restore_path=nemo_model_file_in_cache,
756 override_config_path=override_config_path,
757 map_location=map_location,
758 strict=strict,
759 return_config=return_config,
760 trainer=trainer,
761 save_restore_connector=save_restore_connector,
762 )
763 return instance
File [/usr/local/lib/python3.10/dist-packages/nemo/core/classes/modelPT.py:464](http://localhost:8888/usr/local/lib/python3.10/dist-packages/nemo/core/classes/modelPT.py#line=463), in ModelPT.restore_from(cls, restore_path, override_config_path, map_location, strict, return_config, save_restore_connector, trainer)
461 app_state.model_restore_path = restore_path
463 cls.update_save_restore_connector(save_restore_connector)
--> 464 instance = cls._save_restore_connector.restore_from(
465 cls, restore_path, override_config_path, map_location, strict, return_config, trainer
466 )
467 if isinstance(instance, ModelPT):
468 instance._save_restore_connector = save_restore_connector
File [/usr/local/lib/python3.10/dist-packages/nemo/core/connectors/save_restore_connector.py:255](http://localhost:8888/usr/local/lib/python3.10/dist-packages/nemo/core/connectors/save_restore_connector.py#line=254), in SaveRestoreConnector.restore_from(self, calling_cls, restore_path, override_config_path, map_location, strict, return_config, trainer)
230 """
231 Restores model instance (weights and configuration) into .nemo file
232
(...)
251 An instance of type cls or its underlying config (if return_config is set).
252 """
253 # Get path where the command is executed - the artifacts will be "retrieved" there
254 # (original .nemo behavior)
--> 255 loaded_params = self.load_config_and_state_dict(
256 calling_cls, restore_path, override_config_path, map_location, strict, return_config, trainer,
257 )
258 if not isinstance(loaded_params, tuple) or return_config is True:
259 return loaded_params
File [/usr/local/lib/python3.10/dist-packages/nemo/core/connectors/save_restore_connector.py:179](http://localhost:8888/usr/local/lib/python3.10/dist-packages/nemo/core/connectors/save_restore_connector.py#line=178), in SaveRestoreConnector.load_config_and_state_dict(self, calling_cls, restore_path, override_config_path, map_location, strict, return_config, trainer)
177 calling_cls._set_model_restore_state(is_being_restored=True, folder=tmpdir)
178 instance = calling_cls.from_config_dict(config=conf, trainer=trainer)
--> 179 instance = instance.to(map_location)
180 # add load_state_dict override
181 if app_state.model_parallel_size is not None and app_state.model_parallel_size > 1:
File [/usr/local/lib/python3.10/dist-packages/lightning_fabric/utilities/device_dtype_mixin.py:55](http://localhost:8888/usr/local/lib/python3.10/dist-packages/lightning_fabric/utilities/device_dtype_mixin.py#line=54), in _DeviceDtypeModuleMixin.to(self, *args, **kwargs)
53 device, dtype = torch._C._nn._parse_to(*args, **kwargs)[:2]
54 _update_properties(self, device=device, dtype=dtype)
---> 55 return super().to(*args, **kwargs)
File [/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:1340](http://localhost:8888/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py#line=1339), in Module.to(self, *args, **kwargs)
1337 else:
1338 raise
-> 1340 return self._apply(convert)
File [/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:900](http://localhost:8888/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py#line=899), in Module._apply(self, fn, recurse)
898 if recurse:
899 for module in self.children():
--> 900 module._apply(fn)
902 def compute_should_use_set_data(tensor, tensor_applied):
903 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
904 # If the new tensor has compatible tensor type as the existing tensor,
905 # the current behavior is to change the tensor in-place using `.data =`,
(...)
910 # global flag to let the user control whether they want the future
911 # behavior of overwriting the existing tensor or not.
File [/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py:900](http://localhost:8888/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py#line=899), in Module._apply(self, fn, recurse)
898 if recurse:
899 for module in self.children():
--> 900 module._apply(fn)
902 def compute_should_use_set_data(tensor, tensor_applied):
903 if torch._has_compatible_shallow_copy_type(tensor, tensor_applied):
904 # If the new tensor has compatible tensor type as the existing tensor,
905 # the current behavior is to change the tensor in-place using `.data =`,
(...)
910 # global flag to let the user control whether they want the future
911 # behavior of overwriting the existing tensor or not.
File [/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py:288](http://localhost:8888/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py#line=287), in RNNBase._apply(self, fn, recurse)
283 ret = super()._apply(fn, recurse)
285 # Resets _flat_weights
286 # Note: be v. careful before removing this, as 3rd party device types
287 # likely rely on this behavior to properly .to() modules like LSTM.
--> 288 self._init_flat_weights()
290 return ret
File [/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py:215](http://localhost:8888/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py#line=214), in RNNBase._init_flat_weights(self)
208 self._flat_weights = [
209 getattr(self, wn) if hasattr(self, wn) else None
210 for wn in self._flat_weights_names
211 ]
212 self._flat_weight_refs = [
213 weakref.ref(w) if w is not None else None for w in self._flat_weights
214 ]
--> 215 self.flatten_parameters()
File [/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py:269](http://localhost:8888/usr/local/lib/python3.10/dist-packages/torch/nn/modules/rnn.py#line=268), in RNNBase.flatten_parameters(self)
267 if self.proj_size > 0:
268 num_weights += 1
--> 269 torch._cudnn_rnn_flatten_weight(
270 self._flat_weights,
271 num_weights,
272 self.input_size,
273 rnn.get_cudnn_mode(self.mode),
274 self.hidden_size,
275 self.proj_size,
276 self.num_layers,
277 self.batch_first,
278 bool(self.bidirectional),
279 )
RuntimeError: cuDNN error: CUDNN_STATUS_SUBLIBRARY_VERSION_MISMATCH To get to this point, I ran the tge requirements on a fresh ubuntu docker container. I pulled the Dockerfile from here https://github.com/SYSTRAN/faster-whisper/blob/master/docker/Dockerfile How do I fix this error? |
You need to have matching cudnn versions, |
I've been touched by god. For anyone who was stuck on this error and utilizing docker, here is the solution. This solution uses whisperx only because my notebook still utilizes whisperx. # pull the docker image below from nvidias website
# https://hub.docker.com/r/nvidia/cuda/tags?page=2&name=12.1
sudo docker pull nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
########################################
# standard installs for all containers #
########################################
apt update
apt install -y python3.10
apt install -y python3.10-dev
python3.10 --version
# install sudo
apt update
apt install sudo -y
# install pip and pip3
sudo apt update
sudo apt install python3-pip -y
# install git
sudo apt update
sudo apt install git -y
# install
sudo apt update
sudo apt install nano -y
# install wget
sudo apt update
sudo apt install wget -y
pip install --upgrade pip
pip install ipykernel
pip install jupyterlab
pip install numpy
pip install ipywidgets
#python -m ipykernel install --user --name $ENV_NAME
# mahmoud whisper commands
sudo apt update && sudo apt install cython3 --yes
sudo apt update && sudo apt install ffmpeg --yes
###############################################
# end of standard installs for all containers #
###############################################
# commenting out requirements.txt command because we will
# install the libraries based from the pytorch website.
# This is based on what google colab is using.
# pip install -c constraints.txt -r requirements.txt
# torch installation packages based on google colab as of 10/31/24
# https://download.pytorch.org/whl/torch/
pip install https://download.pytorch.org/whl/cu121_full/torch-2.5.0%2Bcu121-cp310-cp310-linux_x86_64.whl
pip install https://download.pytorch.org/whl/cu121_full/torchaudio-2.5.0%2Bcu121-cp310-cp310-linux_x86_64.whl
pip install torchsummary==1.5.1
pip install https://download.pytorch.org/whl/cu121_full/torchvision-0.20.0%2Bcu121-cp310-cp310-linux_x86_64.whl
# Since we are using a docker container with cuddn from nvidia
# we can skip the commands below
# add to bashrc
# whereis cuda
# nano ~/.bashrc
# export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}
# export LD_LIBRARY_PATH=/usr/local/cuda/lib64:${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}
# source ~/.bashrc
# check cuda version
# nvcc --version
# These pip installs come directly from Mahmoud's google
# colab notebook
# https://colab.research.google.com/github/MahmoudAshraf97/whisper-diarization/blob/main/Whisper_Transcription_%2B_NeMo_Diarization.ipynb#scrollTo=ye1FJVFRO30B
# we are running these commands to our docker container to try and emulate
# google colab's environment for the purpose of using the
# whisper-diarization repo + the original whisperx
pip install git+https://github.com/SYSTRAN/faster-whisper.git ctranslate2==4.4.0
pip install "nemo-toolkit[asr]>=2.dev"
pip install git+https://github.com/MahmoudAshraf97/demucs.git
pip install git+https://github.com/oliverguhr/deepmultilingualpunctuation.git
pip install git+https://github.com/MahmoudAshraf97/ctc-forced-aligner.git
# we still need to install whisperx for our original environment
pip install git+https://github.com/m-bain/whisperX.git@78dcfaab51005aa703ee21375f81ed31bc248560
# launch jupyter lab
jupyter lab Additional Notes: # if you are installing a newly hosted ubuntu image then run
# these commands so that docker can access the GPU
sudo apt update
sudo apt install -y nvidia-container-toolkit
sudo nano /etc/docker/daemon.json
# add this to daemon.json
{
"runtimes": {
"nvidia": {
"path": "nvidia-container-runtime",
"runtimeArgs": []
}
}
}
# then reboot docker
sudo systemctl restart docker
# to access GPU from the docker container run
sudo docker run --gpus all -it -p 8889:8889 \
-v /home/khalid/work:/container/work \
-v /home/khalid/whisper_data:/container/whisper_data \
jupyter:latest /bin/bash
jupyter lab --ip=0.0.0.0 --port=8889 --allow-root --no-browser Update: # Start with NVIDIA's CUDA base image
FROM nvidia/cuda:12.1.0-cudnn8-devel-ubuntu22.04
# Install Python 3.10 and required packages
RUN apt update && \
apt install -y python3.10 python3.10-dev sudo && \
python3.10 --version
# Install pip, git, nano, wget, and upgrade pip
RUN apt update && \
apt install -y python3-pip git nano wget && \
pip install --upgrade pip
# Install standard Python libraries and Jupyter tools
RUN pip install ipykernel jupyterlab numpy ipywidgets
# Install Cython and FFmpeg
RUN apt update && \
apt install -y cython3 ffmpeg
# Install PyTorch and related packages for CUDA 12.1
RUN pip install https://download.pytorch.org/whl/cu121_full/torch-2.5.0%2Bcu121-cp310-cp310-linux_x86_64.whl && \
pip install https://download.pytorch.org/whl/cu121_full/torchaudio-2.5.0%2Bcu121-cp310-cp310-linux_x86_64.whl && \
pip install torchsummary==1.5.1 && \
pip install https://download.pytorch.org/whl/cu121_full/torchvision-0.20.0%2Bcu121-cp310-cp310-linux_x86_64.whl
# Install Whisper and related libraries from GitHub
RUN pip install git+https://github.com/SYSTRAN/faster-whisper.git ctranslate2==4.4.0 && \
pip install "nemo-toolkit[asr]>=2.dev" && \
pip install git+https://github.com/MahmoudAshraf97/demucs.git && \
pip install git+https://github.com/oliverguhr/deepmultilingualpunctuation.git && \
pip install git+https://github.com/MahmoudAshraf97/ctc-forced-aligner.git && \
pip install git+https://github.com/m-bain/whisperX.git@78dcfaab51005aa703ee21375f81ed31bc248560
# Set the working directory (optional)
WORKDIR /workspace
# Set the default command to bash (optional)
CMD ["/bin/bash"] |
Im getting the same error in google colab: To be honest I don't quite understand the solution to get it working again. What should I do exactly? |
@JokanaanR refer to this post OpenNMT/CTranslate2#1806 (comment) |
please help
The text was updated successfully, but these errors were encountered: