We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hi @rusty1s,
Thanks for the awesome work of putting together and maintaining pytorch_scatter. I'm facing an issue with scatter. When I run the following code:
from torch_scatter import scatter import torch x_j = torch.randn((12143200, 192), dtype=torch.float32).to('cuda:0') edge_index = torch.randint(low=0, high=73727, size=(12143200,)).to('cuda:0') out = scatter(src=x_j.to(torch.float32), index=edge_index, dim=0, dim_size=73728, reduce='max') print(out)
I'm setting export CUDA_LAUNCH_BLOCKING=1 before running this code
export CUDA_LAUNCH_BLOCKING=1
I'm using one V100 GPU with 32GB of memory to run this code, here's my nvidia-smi data:
nvidia-smi
Sat Aug 10 13:21:38 2024 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla V100-SXM2... Off | 00000000:06:00.0 Off | 0 | | N/A 33C P0 42W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 1 Tesla V100-SXM2... Off | 00000000:07:00.0 Off | 0 | | N/A 34C P0 43W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 2 Tesla V100-SXM2... Off | 00000000:0A:00.0 Off | 0 | | N/A 34C P0 46W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 3 Tesla V100-SXM2... Off | 00000000:0B:00.0 Off | 0 | | N/A 33C P0 43W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 4 Tesla V100-SXM2... Off | 00000000:85:00.0 Off | 0 | | N/A 33C P0 43W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 5 Tesla V100-SXM2... Off | 00000000:86:00.0 Off | 0 | | N/A 34C P0 44W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 6 Tesla V100-SXM2... Off | 00000000:89:00.0 Off | 0 | | N/A 36C P0 44W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ | 7 Tesla V100-SXM2... Off | 00000000:8A:00.0 Off | 0 | | N/A 33C P0 43W / 300W | 3MiB / 32768MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | No running processes found | +-----------------------------------------------------------------------------+
Here's my conda environment:
name: MyEnv channels: - pytorch - conda-forge - defaults dependencies: - _libgcc_mutex=0.1=main - _openmp_mutex=5.1=1_gnu - blas=1.0=mkl - brotli-python=1.0.9=py37hd23a5d3_7 - bzip2=1.0.8=h7f98852_4 - ca-certificates=2023.08.22=h06a4308_0 - certifi=2022.12.7=py37h06a4308_0 - charset-normalizer=3.3.0=pyhd8ed1ab_0 - cudatoolkit=10.2.89=hfd86e86_1 - ffmpeg=4.3.2=hca11adc_0 - flit-core=3.6.0=pyhd3eb1b0_0 - freetype=2.12.1=h4a9f257_0 - giflib=5.2.1=h5eee18b_3 - gmp=6.2.1=h58526e2_0 - gnutls=3.6.13=h85f3911_1 - idna=3.4=pyhd8ed1ab_0 - intel-openmp=2023.1.0=hdb19cb5_46305 - jpeg=9b=h024ee3a_2 - lame=3.100=h7f98852_1001 - lcms2=2.12=h3be6417_0 - ld_impl_linux-64=2.38=h1181459_1 - libffi=3.4.4=h6a678d5_0 - libgcc-ng=11.2.0=h1234567_1 - libgomp=11.2.0=h1234567_1 - libpng=1.6.39=h5eee18b_0 - libstdcxx-ng=11.2.0=h1234567_1 - libtiff=4.2.0=h85742a9_0 - libuv=1.44.2=h5eee18b_0 - libwebp=1.2.0=h89dd481_0 - libwebp-base=1.2.0=h27cfd23_0 - lz4-c=1.9.4=h6a678d5_0 - mkl=2020.2=256 - mkl-service=2.3.0=py37he8ac12f_0 - mkl_fft=1.3.0=py37h54f3939_0 - mkl_random=1.1.1=py37h0573a6f_0 - ncurses=6.4=h6a678d5_0 - nettle=3.6=he412f7d_0 - ninja=1.10.2=h06a4308_5 - ninja-base=1.10.2=hd09550d_5 - openh264=2.1.1=h780b84a_0 - openssl=1.1.1w=h7f8727e_0 - pillow=9.3.0=py37hace64e9_1 - pip=22.3.1=py37h06a4308_0 - pysocks=1.7.1=py37h89c1867_5 - python=3.7.16=h7a1cb2a_0 - python_abi=3.7=2_cp37m - pytorch-mutex=1.0=cuda - pyyaml=6.0=py37h5eee18b_1 - readline=8.2=h5eee18b_0 - requests=2.31.0=pyhd8ed1ab_0 - setuptools=65.6.3=py37h06a4308_0 - six=1.16.0=pyhd3eb1b0_1 - sqlite=3.41.2=h5eee18b_0 - tbb=2021.8.0=hdb19cb5_0 - timm=0.3.2=pyhd8ed1ab_0 - tk=8.6.12=h1ccaba5_0 - typing_extensions=4.4.0=py37h06a4308_0 - urllib3=2.0.6=pyhd8ed1ab_0 - wheel=0.38.4=py37h06a4308_0 - x264=1!161.3030=h7f98852_1 - xz=5.4.2=h5eee18b_0 - yaml=0.2.5=h7b6447c_0 - zlib=1.2.13=h5eee18b_0 - zstd=1.4.9=haebb681_0 - pip: - cffi==1.15.1 - cryptography==42.0.5 - cupy-cuda102==11.6.0 - cycler==0.11.0 - fastrlock==0.8.2 - fonttools==4.38.0 - jinja2==3.1.3 - joblib==1.3.2 - kiwisolver==1.4.5 - markupsafe==2.1.5 - matplotlib==3.5.3 - numpy==1.21.6 - nvidia-cublas-cu11==11.10.3.66 - nvidia-cuda-nvrtc-cu11==11.7.99 - nvidia-cuda-runtime-cu11==11.7.99 - nvidia-cudnn-cu11==8.5.0.96 - packaging==23.2 - pandas==1.3.5 - psutil==5.9.8 - pycparser==2.21 - pydeprecate==0.3.2 - pyopenssl==24.1.0 - pyparsing==3.1.1 - python-dateutil==2.8.2 - pytz==2023.3.post1 - scikit-learn==1.0.2 - scipy==1.7.3 - threadpoolctl==3.1.0 - torch==1.7.1+cu110 - torch-geometric==2.3.1 - torch-scatter==2.0.7 - torchaudio==0.7.2 - torcheval==0.0.7 - torchmetrics==0.7.2 - torchprofile==0.0.4 - torchvision==0.8.2+cu110 - tqdm==4.66.2
This is the error I face:
Traceback (most recent call last): File "playground.py", line 5, in <module> out = scatter(src=x_j.to(torch.float32), index=edge_index, dim=0, dim_size=73728, reduce='max') File "/raid/ismail2/miniconda3/envs/MyEnv/lib/python3.7/site-packages/torch_scatter/scatter.py", line 161, in scatter return scatter_max(src, index, dim, out, dim_size)[0] File "/raid/ismail2/miniconda3/envs/MyEnv/lib/python3.7/site-packages/torch_scatter/scatter.py", line 73, in scatter_max return torch.ops.torch_scatter.scatter_max(src, index, dim, out, dim_size) RuntimeError: CUDA error: an illegal memory access was encountered
I've been stuck here for a while and would really appreciate any help on this. Thanks.
PS: AFAIU, the illegal memory error is different from the out-of-memory error.
The text was updated successfully, but these errors were encountered:
I've recreated the issue in a Kaggle notebook: https://www.kaggle.com/code/ismaelelsharkawi/torch-scatter-gives-an-illegal-memory-access-456
Sorry, something went wrong.
No branches or pull requests
Hi @rusty1s,
Thanks for the awesome work of putting together and maintaining pytorch_scatter.
I'm facing an issue with scatter.
When I run the following code:
I'm setting
export CUDA_LAUNCH_BLOCKING=1
before running this codeI'm using one V100 GPU with 32GB of memory to run this code, here's my
nvidia-smi
data:Here's my conda environment:
This is the error I face:
I've been stuck here for a while and would really appreciate any help on this. Thanks.
PS: AFAIU, the illegal memory error is different from the out-of-memory error.
The text was updated successfully, but these errors were encountered: