Error while training Video model with Vimeo-90k Dataset #325

gopi77 · 2024-12-16T20:04:04Z

Bug

Got error while training Video model with Vimeo-90k Dataset
Training data: http://data.csail.mit.edu/tofu/dataset/vimeo_triplet.zip

To Reproduce

Steps to reproduce the behavior:
python ../compressai/examples/train_video.py --cuda -m ssf2020 --save -d vimeo_triplet/ --checkpoint . --seed 17122024/usr/local/lib/python3.11/dist-packages/compressai/models/video/google.py:353: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead. @amp.autocast(enabled=False)
Traceback (most recent call last):
File "/workspace/../compressai/examples/train_video.py", line 475, in main(sys.argv[1:])
File "/workspace/../compressai/examples/train_video.py", line 392, in main train_dataset = VideoFolder( ^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/compressai/datasets/video.py", line 93, in init
raise RuntimeError(f'Missing file "{splitfile}"')RuntimeError: Missing file "vimeo_triplet/train.list"
1.
1.
1.

Expected behavior

Environment

Please copy and paste the output from python3 -m torch.utils.collect_env
python3 -m torch.utils.collect_env
:128: RuntimeWarning: 'torch.utils.collect_env' found in sys.modules after import of package 'torch.utils', but prior to execution of 'torch.utils.collect_env'; this may result in unpredictable behaviourCollecting environment information...PyTorch version: 2.4.1+cu124Is debug build: False
CUDA used to build PyTorch: 12.4
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.5 LTS (x86_64)GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0Clang version: Could not collect
CMake version: Could not collect
Libc version: glibc-2.35
Python version: 3.11.10 (main, Sep 7 2024, 18:35:41) [GCC 11.4.0] (64-bit runtime)
Python platform: Linux-6.8.0-47-generic-x86_64-with-glibc2.35Is CUDA available: True
CUDA runtime version: 12.4.131
CUDA_MODULE_LOADING set to: LAZY
GPU models and configuration: GPU 0: NVIDIA GeForce RTX 4090
Nvidia driver version: 550.127.05
cuDNN version: Could not collect
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True

CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 48 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: AuthenticAMD
Model name: AMD Ryzen 9 7950X 16-Core Processor
CPU family: 25
Model: 97
Thread(s) per core: 2
Core(s) per socket: 16
Socket(s): 1
Stepping: 2
CPU max MHz: 5881.0000
CPU min MHz: 400.0000
BogoMIPS: 8983.04
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb rdtscp lm constant_tsc rep_good amd_lbr_v2 nopl nonstop_tsc cpuid extd_apicid aperfmperf rapl pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 movbe popcnt aes xsave avx f16c rdrand lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch osvw ibs skinit wdt tce topoext perfctr_core perfctr_nb bpext perfctr_llc mwaitx cpb cat_l3 cdp_l3 hw_pstate ssbd mba perfmon_v2 ibrs ibpb stibp ibrs_enhanced vmmcall fsgsbase bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local user_shstk avx512_bf16 clzero irperf xsaveerptr rdpru wbnoinvd cppc arat npt lbrv svm_lock nrip_save tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthresholdavic v_vmsave_vmload vgif x2avic v_spec_ctrl vnmi avx512vbmi umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg avx512_vpopcntdq rdpid overflow_recov succor smca fsrm flush_l1d
Virtualization: AMD-V
L1d cache: 512 KiB (16 instances)
L1i cache: 512 KiB (16 instances)
L2 cache: 16 MiB (16 instances)
L3 cache: 64 MiB (2 instances)
NUMA node(s): 1
NUMA node0 CPU(s): 0-31
Vulnerability Gather data sampling: Not affected
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Not affected
Vulnerability Reg file data sampling: Not affected
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Mitigation; Safe RET
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS; IBPB conditional; STIBP always-on; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected

Versions of relevant libraries:
[pip3] numpy==1.26.3
[pip3] pytorch-msssim==1.0.0
[pip3] torch==2.4.1+cu124
[pip3] torch-geometric==2.6.1
[pip3] torchaudio==2.4.1+cu124
[pip3] torchvision==0.19.1+cu124
[pip3] triton==3.0.0
[conda] Could not collect

- PyTorch / CompressAI Version (e.g., 1.0 / 0.4.0):
- OS (e.g., Linux):
- How you installed PyTorch / CompressAI (`pip`, source):
- Build command you used (if compiling from source):
- Python version:
- CUDA/cuDNN version:
- GPU models and configuration:
- Any other relevant information:

Additional context

The text was updated successfully, but these errors were encountered:

gopi77 · 2024-12-17T08:35:12Z

I have updated the code of train_video.py as below ( and created a new file named train_video_vimeo.py) and solved the mentioned issue. (Ref: #105)

from compressai.datasets import Vimeo90kDataset

train_dataset = Vimeo90kDataset(
    args.dataset, split="train", transform=train_transforms
)
test_dataset = Vimeo90kDataset(
    args.dataset, split="valid", transform=test_transforms
)

<<<

But got another error. Copied below.

python ../../compressai/examples/train_video_vimeo.py --cuda -m ssf2020 --save -d ../dataset/vimeo_triplet/
/usr/local/lib/python3.11/dist-packages/compressai/models/video/google.py:353: FutureWarning: torch.cuda.amp.autocast(args...) is deprecated. Please use torch.amp.autocast('cuda', args...) instead.
@amp.autocast(enabled=False)
Learning rate: 0.0001
Traceback (most recent call last):
File "/workspace/train/../../compressai/examples/train_video_vimeo.py", line 469, in
main(sys.argv[1:])
File "/workspace/train/../../compressai/examples/train_video_vimeo.py", line 439, in main
train_one_epoch(
File "/workspace/train/../../compressai/examples/train_video_vimeo.py", line 239, in train_one_epoch
out_net = model(d)
^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/compressai/models/video/google.py", line 217, in forward
x_hat, likelihoods = self.forward_keyframe(frames[0])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/compressai/models/video/google.py", line 235, in forward_keyframe
y_hat, likelihoods = self.img_hyperprior(y)
^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/compressai/models/video/google.py", line 158, in forward
z_hat, z_likelihoods = self.entropy_bottleneck(z)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/nn/modules/module.py", line 1562, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/compressai/entropy_models/entropy_models.py", line 493, in forward
likelihood, _, _ = self._likelihood(outputs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/compressai/entropy_models/entropy_models.py", line 458, in _likelihood
lower = self._logits_cumulative(inputs - half, stop_gradient=stop_gradient)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/compressai/entropy_models/entropy_models.py", line 439, in _logits_cumulative
logits = torch.matmul(F.softplus(matrix), logits)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: The size of tensor a (192) must match the size of tensor b (2) at non-singleton dimension 0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while training Video model with Vimeo-90k Dataset #325

Error while training Video model with Vimeo-90k Dataset #325

gopi77 commented Dec 16, 2024

gopi77 commented Dec 17, 2024

Error while training Video model with Vimeo-90k Dataset #325

Error while training Video model with Vimeo-90k Dataset #325

Comments

gopi77 commented Dec 16, 2024

Bug

To Reproduce

Expected behavior

Environment

Additional context

gopi77 commented Dec 17, 2024