Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-gpu training cause RuntimeError #2

Open
Hsintien-Ng opened this issue Dec 3, 2022 · 0 comments
Open

Multi-gpu training cause RuntimeError #2

Hsintien-Ng opened this issue Dec 3, 2022 · 0 comments

Comments

@Hsintien-Ng
Copy link

Hi, I try to run the train.py on ffhq dataset in multi-gpu manner. However, I meet the RuntimeError as follows

training_process(rank, world_size, opt, device)
File "/home/xintian/workspace/GRAM-main/training_loop.py", line 217, in training_process
d_loss = process.train_D(real_imgs, real_poses, generator_ddp, discriminator_ddp, optimizer_D, scaler, config, device)
File "/home/xintian/workspace/GRAM-main/processes/processes.py", line 38, in train_D
g_imgs, g_pos = generator_ddp(subset_z, **config['camera'])
File "/home/xintian/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xintian/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/parallel/distributed.py", line 705, in forward
output = self.module(*inputs[0], **kwargs[0])
File "/home/xintian/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xintian/workspace/GRAM-main/generators/generators.py", line 112, in forward
img, _ = self.renderer.render(self._intersections, self._volume(z, truncation_psi), img_size, camera_origin, camera_pos, fov, ray_start, ray_end, z.device)
File "/home/xintian/workspace/GRAM-main/generators/renderers/manifold_renderer.py", line 194, in render
coarse_output = volume(transformed_points, transformed_ray_directions_expanded).reshape(batchsize, img_size * img_size, self.num_manifolds, 4)
File "/home/xintian/workspace/GRAM-main/generators/generators.py", line 76, in
return lambda points, ray_directions: self.representation.get_radiance(z, points, ray_directions, truncation_psi)
File "/home/xintian/workspace/GRAM-main/generators/representations/gram.py", line 317, in get_radiance
return self.rf_network(x, z, ray_directions, truncation_psi)
File "/home/xintian/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xintian/workspace/GRAM-main/generators/representations/gram.py", line 260, in forward
frequencies_2, phase_shifts_2 = self.mapping_network(z2)
File "/home/xintian/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xintian/workspace/GRAM-main/generators/representations/gram.py", line 93, in forward
frequencies_offsets = self.network(z)
File "/home/xintian/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xintian/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/container.py", line 119, in forward
input = module(input)
File "/home/xintian/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/xintian/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/modules/linear.py", line 94, in forward
return F.linear(input, self.weight, self.bias)
File "/home/xintian/anaconda3/envs/torch18/lib/python3.6/site-packages/torch/nn/functional.py", line 1753, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: Expected tensor for 'out' to have the same device as tensor for argument #2 'mat1'; but device 1 does not equal 0 (while checking arguments for addmm)

How can I fix this? By the way, when I run the train.py in single-gpu manner, it works.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant