Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I can not train swinTransformer on ADE datasets with multi-gpus #82

Open
youdutaidi opened this issue May 23, 2022 · 0 comments
Open

I can not train swinTransformer on ADE datasets with multi-gpus #82

youdutaidi opened this issue May 23, 2022 · 0 comments

Comments

@youdutaidi
Copy link

My command is:
CUDA_VISIBLE_DEVICES=2,3 python -m torch.distributed.launch --nproc_per_node=2 tools/train.py configs/swin/upernet_swin_small_patch4_window7_512x512_160k_ade20k.py --launcher pytorch

but got:

2022-05-23 11:06:19,676 - mmseg - INFO - workflow: [('train', 1)], max: 160000 iters
Traceback (most recent call last):
File "tools/train.py", line 163, in
Traceback (most recent call last):
main()
File "tools/train.py", line 163, in
File "tools/train.py", line 152, in main
train_segmentor(
File "/home/cailingling/code/SwinTransformer3/mmseg/apis/train.py", line 116, in train_segmentor
runner.run(data_loaders, cfg.workflow)
File "/home/cailingling/.local/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 130, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/cailingling/.local/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
main()
File "tools/train.py", line 152, in main
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/home/cailingling/.local/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 36, in train_step
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/cailingling/code/SwinTransformer3/mmseg/models/segmentors/base.py", line 152, in train_step
train_segmentor(
File "/home/cailingling/code/SwinTransformer3/mmseg/apis/train.py", line 116, in train_segmentor
losses = self(**data_batch)
File "/home/cailingling/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
runner.run(data_loaders, cfg.workflow)
File "/home/cailingling/.local/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 130, in run
iter_runner(iter_loaders[i], **kwargs)
File "/home/cailingling/.local/lib/python3.8/site-packages/mmcv/runner/iter_based_runner.py", line 60, in train
result = self.forward(*input, **kwargs)
File "/home/cailingling/.local/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
outputs = self.model.train_step(data_batch, self.optimizer, **kwargs)
File "/home/cailingling/.local/lib/python3.8/site-packages/mmcv/parallel/distributed.py", line 36, in train_step
return old_func(*args, **kwargs)
File "/home/cailingling/code/SwinTransformer3/mmseg/models/segmentors/base.py", line 122, in forward
output = self.module.train_step(*inputs[0], **kwargs[0])
File "/home/cailingling/code/SwinTransformer3/mmseg/models/segmentors/base.py", line 152, in train_step
return self.forward_train(img, img_metas, **kwargs)
File "/home/cailingling/code/SwinTransformer3/mmseg/models/segmentors/encoder_decoder.py", line 157, in forward_train
losses = self(**data_batch)
File "/home/cailingling/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/modules/module.py", line 550, in call
loss_decode = self._decode_head_forward_train(x, img_metas,
File "/home/cailingling/code/SwinTransformer3/mmseg/models/segmentors/encoder_decoder.py", line 100, in _decode_head_forward_train
loss_decode = self.decode_head.forward_train(x, img_metas,
File "/home/cailingling/code/SwinTransformer3/mmseg/models/decode_heads/decode_head.py", line 187, in forward_train
losses = self.losses(seg_logits, gt_semantic_seg)
File "/home/cailingling/.local/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 164, in new_func
result = self.forward(*input, **kwargs)
File "/home/cailingling/.local/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 84, in new_func
return old_func(*args, **kwargs)
File "/home/cailingling/code/SwinTransformer3/mmseg/models/decode_heads/decode_head.py", line 218, in losses
return old_func(*args, **kwargs)
File "/home/cailingling/code/SwinTransformer3/mmseg/models/segmentors/base.py", line 122, in forward
seg_logit = resize(
File "/home/cailingling/code/SwinTransformer3/mmseg/ops/wrappers.py", line 29, in resize
return F.interpolate(input, size, scale_factor, mode, align_corners)
File "/home/cailingling/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 3012, in interpolate
return self.forward_train(img, img_metas, **kwargs)
File "/home/cailingling/code/SwinTransformer3/mmseg/models/segmentors/encoder_decoder.py", line 157, in forward_train
loss_decode = self._decode_head_forward_train(x, img_metas,
File "/home/cailingling/code/SwinTransformer3/mmseg/models/segmentors/encoder_decoder.py", line 100, in _decode_head_forward_train
loss_decode = self.decode_head.forward_train(x, img_metas,
File "/home/cailingling/code/SwinTransformer3/mmseg/models/decode_heads/decode_head.py", line 187, in forward_train
losses = self.losses(seg_logits, gt_semantic_seg)
File "/home/cailingling/.local/lib/python3.8/site-packages/mmcv/runner/fp16_utils.py", line 164, in new_func
return old_func(*args, **kwargs)
File "/home/cailingling/code/SwinTransformer3/mmseg/models/decode_heads/decode_head.py", line 218, in losses
seg_logit = resize(
File "/home/cailingling/code/SwinTransformer3/mmseg/ops/wrappers.py", line 29, in resize
return F.interpolate(input, size, scale_factor, mode, align_corners)
File "/home/cailingling/anaconda3/envs/pytorch/lib/python3.8/site-packages/torch/nn/functional.py", line 3012, in interpolate
return torch._C._nn.upsample_bilinear2d(input, _interp_output_size(2, closed_over_args), align_corners,
RuntimeError: It is expected output_size equals to 2, but got size 3
return torch._C._nn.upsample_bilinear2d(input, _interp_output_size(2, closed_over_args), align_corners,
RuntimeError: It is expected output_size equals to 2, but got size 3
Traceback (most recent call last):
File "/home/cailingling/anaconda3/envs/pytorch/lib/python3.8/runpy.py", line 194, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/home/cailingling/anaconda3/envs/pytorch/lib/python3.8/runpy.py", line 87, in _run_code
exec(code, run_globals)

However, single-gpu can work well...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant