You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It looks like the run was killed due to the issues with GPU memory usage when model2 is used. However, the same input sequence runs fine with model 1. Do you have any clues?
nvidia-smi
Thu Feb 23 16:38:21 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA TITAN RTX On | 00000000:1A:00.0 Off | N/A |
| 41% 39C P2 71W / 280W | 7148MiB / 24220MiB | 2% Default |
| | | N/A |
omegafold --num_cycle 1 --model 1 gene_5088_NI907.fasta test4
INFO:root:Loading weights from /global/scratch/users/skyungyong/omegafold/omegafold_ckpt/model.pt
INFO:root:Constructing OmegaFold
INFO:root:Reading gene_5088_NI907.fasta
INFO:root:Predicting 1th chain in gene_5088_NI907.fasta
INFO:root:365 residues in this chain.
INFO:root:Finished prediction in 23.76 seconds.
INFO:root:Saving prediction to test4/gene_5088_NI907.pdb
INFO:root:Saved
INFO:root:Done!
omegafold --num_cycle 1 --model 2 gene_5088_NI907.fasta test5
INFO:root:Loading weights from /global/scratch/users/skyungyong/omegafold/omegafold_ckpt/model2.pt
INFO:root:Constructing OmegaFold
Killed
Using a better GPU didn't help.
nvidia-smi
Thu Feb 23 16:39:10 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 470.141.03 Driver Version: 470.141.03 CUDA Version: 11.4 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 NVIDIA A40 Off | 00000000:41:00.0 Off | 0 |
| 0% 32C P8 30W / 300W | 23MiB / 45634MiB | 0% Default |
| | | N/A |
omegafold --num_cycle 1 --model 1 gene_5088_NI907.fasta test10
INFO:root:Loading weights from /global/scratch/users/skyungyong/omegafold/omegafold_ckpt/model.pt
INFO:root:Constructing OmegaFold
INFO:root:Reading gene_5088_NI907.fasta
INFO:root:Predicting 1th chain in gene_5088_NI907.fasta
INFO:root:365 residues in this chain.
INFO:root:Finished prediction in 12.72 seconds.
INFO:root:Saving prediction to test10/gene_5088_NI907.pdb
INFO:root:Saved
INFO:root:Done!
omegafold --num_cycle 1 --model 2 gene_5088_NI907.fasta test11
INFO:root:Loading weights from /global/scratch/users/skyungyong/omegafold/omegafold_ckpt/model2.pt
INFO:root:Constructing OmegaFold
INFO:root:Reading gene_5088_NI907.fasta
INFO:root:Predicting 1th chain in gene_5088_NI907.fasta
INFO:root:365 residues in this chain.
INFO:root:Failed to generate test11/gene_5088_NI907.pdb due to CUDA out of memory. Tried to allocate 10.67 GiB (GPU 0; 44.56 GiB total capacity; 32.65 GiB already allocated; 9.25 GiB free; 33.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
INFO:root:Skipping...
INFO:root:Done!
Using --subbatch_size also didn't help.
omegafold --subbatch_size 1 --num_cycle 1 --model 2 gene_5088_NI907.fasta test11
INFO:root:Loading weights from /global/scratch/users/skyungyong/omegafold/omegafold_ckpt/model2.pt
INFO:root:Constructing OmegaFold
INFO:root:Reading gene_5088_NI907.fasta
INFO:root:Predicting 1th chain in gene_5088_NI907.fasta
INFO:root:365 residues in this chain.
INFO:root:Failed to generate test11/gene_5088_NI907.pdb due to CUDA out of memory. Tried to allocate 10.67 GiB (GPU 0; 44.56 GiB total capacity; 32.65 GiB already allocated; 9.25 GiB free; 33.13 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
INFO:root:Skipping...
INFO:root:Done!
Thanks!
The text was updated successfully, but these errors were encountered:
INFO:root:Failed to generate my_output4/ranked_0.pdb due to CUDA out of memory. Tried to allocate 7.80 GiB (GPU 0; 31.75 GiB total capacity; 24.66 GiB already allocated; 5.82 GiB free; 24.92 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
my command is omegafold a.fa my_output4 --model 2 --subbatch_size 1 --num_cycle 1
and model 1 works just fine and the length of my sequence is 311. any suggestion? thanks
Edit: OOM message was printed within RecycleEmbedder in my case, however I didn't find this class is using subbatch_size.
I had the same problem, running model2, and switching to model1 worked fine. Here is the error message for model2:
INFO:root:379 residues in this chain.
INFO:root:Failed to generate xxx/xxx.pdb due to CUDA out of memory. Tried to allocate 6.71 GiB (GPU 0; 23.70 GiB total capacity; 18.67 GiB already allocated; 3.43 GiB free; 19.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
INFO:root:Skipping...
Hi!
It looks like the run was killed due to the issues with GPU memory usage when model2 is used. However, the same input sequence runs fine with model 1. Do you have any clues?
Using a better GPU didn't help.
Using --subbatch_size also didn't help.
Thanks!
The text was updated successfully, but these errors were encountered: