Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hello,please tell me how to use generated samples to train #8

Open
liuxiuxuhaodong opened this issue Jun 25, 2018 · 13 comments
Open

hello,please tell me how to use generated samples to train #8

liuxiuxuhaodong opened this issue Jun 25, 2018 · 13 comments

Comments

@liuxiuxuhaodong
Copy link

I make use of market-1501 datasets to train DCGAN and get the number of pictures,but,I do not know how to train baseline by generated samples,such as the name of generated pictures,class ,please tell me,thank you.

@qiaoguan
Copy link
Owner

hi, try to read the source code of train_baseline.py and prepare.py. you do not need to know all the details of the source code, just know how the model read the dataset(the path), its not difficult, bset wishes!

@liuxiuxuhaodong
Copy link
Author

well,I did it,but when i run train_baseline.py.i met a trouble that print
Traceback (most recent call last):
File "/home/dl/Person-reid-GAN/train_baseline.py", line 339, in
os.mkdir(dir_name)
FileNotFoundError: [Errno 2] No such file or directory: './model/ft_DesNet121'

@qiaoguan
Copy link
Owner

just create a new folder(named model)

@liuxiuxuhaodong
Copy link
Author

thank you very much!

@liuxiuxuhaodong
Copy link
Author

excuse me,after creat a new folder named model,i run again,then i also meet a new question that printed follw:
RuntimeError: cuda runtime error (10) : invalid device ordinal at torch/csrc/cuda/Module.cpp:84
I look for solutions on internet.there is a blog writer who met a similar trouble in https://blog.csdn.net/shincling/article/details/78919282.But i just to learn pytorch.Would you like to help me resolve this problem?

@liuxiuxuhaodong
Copy link
Author

My computer only have one gpu.

@qiaoguan
Copy link
Owner

qiaoguan commented Jul 5, 2018

just change it to the single GPU-training mode.
torch.cuda.set_device(gpu_ids[0])
and delete the code : model=nn.DataParallel(model,device_ids=[0,1,2]) # multi-GPU
for mode details , you can search the internet

@liuxiuxuhaodong
Copy link
Author

thank you very much,i have solved my trouble and run it successfully.

@liuxiuxuhaodong
Copy link
Author

I am so sorry to trouble you again,i run train_baseline.py successfully a few days ago,but i just meet a new problem when i run it again.I read your demo ,but i don't find solution,the problem is printed follow:
train Loss: 291.3983 Acc: 0.0109
/pytorch/torch/lib/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [0,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/torch/lib/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [2,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/torch/lib/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [6,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/torch/lib/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [7,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/torch/lib/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [10,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/torch/lib/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [11,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generated/../THCReduceAll.cuh line=339 error=59 : device-side assert triggered
Traceback (most recent call last):
File "train_baseline.py", line 346, in
num_epochs=130)
File "train_baseline.py", line 246, in train_model
loss = criterion(outputs,labels,flags)
File "/home/dl/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "train_baseline.py", line 173, in forward
return loss.mean()
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generated/../THCReduceAll.cuh:339
I don't modify your code , a little confused.

@liuxiuxuhaodong
Copy link
Author

I have solved it ,thank you

@flychen321
Copy link

I met it too, how did you solve it?

@Vincy-L
Copy link

Vincy-L commented May 1, 2019

me too,anyone solved it?

@Vincy-L
Copy link

Vincy-L commented May 1, 2019

I am so sorry to trouble you again,i run train_baseline.py successfully a few days ago,but i just meet a new problem when i run it again.I read your demo ,but i don't find solution,the problem is printed follow:
train Loss: 291.3983 Acc: 0.0109
/pytorch/torch/lib/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [0,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/torch/lib/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [2,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/torch/lib/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [6,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/torch/lib/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [7,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/torch/lib/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [10,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
/pytorch/torch/lib/THC/THCTensorScatterGather.cu:97: void THCudaTensor_gatherKernel(TensorInfo<Real, IndexType>, TensorInfo<Real, IndexType>, TensorInfo<long, IndexType>, int, IndexType) [with IndexType = unsigned int, Real = float, Dims = 2]: block: [0,0,0], thread: [11,0,0] Assertion indexValue >= 0 && indexValue < src.sizes[dim] failed.
THCudaCheck FAIL file=/pytorch/torch/lib/THC/generated/../THCReduceAll.cuh line=339 error=59 : device-side assert triggered
Traceback (most recent call last):
File "train_baseline.py", line 346, in
num_epochs=130)
File "train_baseline.py", line 246, in train_model
loss = criterion(outputs,labels,flags)
File "/home/dl/anaconda3/lib/python3.5/site-packages/torch/nn/modules/module.py", line 357, in call
result = self.forward(*input, **kwargs)
File "train_baseline.py", line 173, in forward
return loss.mean()
RuntimeError: cuda runtime error (59) : device-side assert triggered at /pytorch/torch/lib/THC/generated/../THCReduceAll.cuh:339
I don't modify your code , a little confused.

can you tell me how to solve it,thank you

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants