Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Error when training model #6

Open
ShiinaMitsuki opened this issue Jun 7, 2018 · 2 comments
Open

Getting Error when training model #6

ShiinaMitsuki opened this issue Jun 7, 2018 · 2 comments

Comments

@ShiinaMitsuki
Copy link

Hi there, I followed the instruction inthe README but got error as below:

(dcgan) [sobey123@localhost DCGAN-tensorflow]$ ./train.sh
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
{'batch_size': 64,
'beta1': 0.5,
'checkpoint_dir': 'checkpoint',
'crop': False,
'dataset': 'market',
'epoch': 100,
'input_fname_pattern': '.jpg',
'input_height': 128,
'input_width': None,
'learning_rate': 0.0002,
'options': 1,
'output_height': 256,
'output_path': 'duke_result',
'output_width': None,
'sample_dir': 'samples',
'sample_size': 1000,
'train': True,
'train_size': inf,
'unrolled_lstm': False,
'visualize': False}
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.683
pciBusID 0000:84:00.0
Total memory: 7.93GiB
Free memory: 7.83GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:84:00.0)
WARNING:tensorflow:From /home/sobey123/code/project/Person-reid-GAN-pytorch/DCGAN-tensorflow/model.py:109 in build_model.: histogram_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.histogram. Note that tf.summary.histogram uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on their scope.
Traceback (most recent call last):
File "main.py", line 103, in
tf.app.run()
File "/home/sobey123/miniconda2/envs/dcgan/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "main.py", line 81, in main
sample_dir=FLAGS.sample_dir)
File "/home/sobey123/code/project/Person-reid-GAN-pytorch/DCGAN-tensorflow/model.py", line 89, in init
self.build_model()
File "/home/sobey123/code/project/Person-reid-GAN-pytorch/DCGAN-tensorflow/model.py", line 114, in build_model
self.D_, self.D_logits_ = self.discriminator(self.G, self.y, reuse=True)
File "/home/sobey123/code/project/Person-reid-GAN-pytorch/DCGAN-tensorflow/model.py", line 324, in discriminator
h4 = linear(tf.reshape(h3, [self.batch_size, -1]), 1, 'd_h4_lin')
File "/home/sobey123/code/project/Person-reid-GAN-pytorch/DCGAN-tensorflow/ops.py", line 98, in linear
tf.random_normal_initializer(stddev=stddev))
File "/home/sobey123/miniconda2/envs/dcgan/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
custom_getter=custom_getter)
File "/home/sobey123/miniconda2/envs/dcgan/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
custom_getter=custom_getter)
File "/home/sobey123/miniconda2/envs/dcgan/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
validate_shape=validate_shape)
File "/home/sobey123/miniconda2/envs/dcgan/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/home/sobey123/miniconda2/envs/dcgan/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 637, in _get_single_variable
found_var.get_shape()))
ValueError: Trying to share variable discriminator/d_h4_lin/Matrix, but specified shape (131072, 1) and found shape (32768, 1).
(dcgan) [sobey123@localhost DCGAN-tensorflow]$ vim train.sh
(dcgan) [sobey123@localhost DCGAN-tensorflow]$ ./train.sh
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcublas.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcudnn.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcufft.so locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcuda.so.1 locally
I tensorflow/stream_executor/dso_loader.cc:128] successfully opened CUDA library libcurand.so locally
{'batch_size': 64,
'beta1': 0.5,
'checkpoint_dir': 'checkpoint',
'crop': False,
'dataset': 'market',
'epoch': 25,
'input_fname_pattern': '
.jpg',
'input_height': 108,
'input_width': None,
'learning_rate': 0.0002,
'options': 1,
'output_height': 64,
'output_path': 'duke_result',
'output_width': None,
'sample_dir': 'samples',
'sample_size': 1000,
'train': False,
'train_size': inf,
'unrolled_lstm': False,
'visualize': False}
I tensorflow/core/common_runtime/gpu/gpu_device.cc:885] Found device 0 with properties:
name: GeForce GTX 1070
major: 6 minor: 1 memoryClockRate (GHz) 1.683
pciBusID 0000:84:00.0
Total memory: 7.93GiB
Free memory: 7.83GiB
I tensorflow/core/common_runtime/gpu/gpu_device.cc:906] DMA: 0
I tensorflow/core/common_runtime/gpu/gpu_device.cc:916] 0: Y
I tensorflow/core/common_runtime/gpu/gpu_device.cc:975] Creating TensorFlow device (/gpu:0) -> (device: 0, name: GeForce GTX 1070, pci bus id: 0000:84:00.0)
WARNING:tensorflow:From /home/sobey123/code/project/Person-reid-GAN-pytorch/DCGAN-tensorflow/model.py:109 in build_model.: histogram_summary (from tensorflow.python.ops.logging_ops) is deprecated and will be removed after 2016-11-30.
Instructions for updating:
Please switch to tf.summary.histogram. Note that tf.summary.histogram uses the node name instead of the tag. This means that TensorFlow will automatically de-duplicate summary names based on their scope.
Traceback (most recent call last):
File "main.py", line 103, in
tf.app.run()
File "/home/sobey123/miniconda2/envs/dcgan/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 43, in run
sys.exit(main(sys.argv[:1] + flags_passthrough))
File "main.py", line 81, in main
sample_dir=FLAGS.sample_dir)
File "/home/sobey123/code/project/Person-reid-GAN-pytorch/DCGAN-tensorflow/model.py", line 89, in init
self.build_model()
File "/home/sobey123/code/project/Person-reid-GAN-pytorch/DCGAN-tensorflow/model.py", line 114, in build_model
self.D_, self.D_logits_ = self.discriminator(self.G, self.y, reuse=True)
File "/home/sobey123/code/project/Person-reid-GAN-pytorch/DCGAN-tensorflow/model.py", line 324, in discriminator
h4 = linear(tf.reshape(h3, [self.batch_size, -1]), 1, 'd_h4_lin')
File "/home/sobey123/code/project/Person-reid-GAN-pytorch/DCGAN-tensorflow/ops.py", line 98, in linear
tf.random_normal_initializer(stddev=stddev))
File "/home/sobey123/miniconda2/envs/dcgan/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 1024, in get_variable
custom_getter=custom_getter)
File "/home/sobey123/miniconda2/envs/dcgan/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 850, in get_variable
custom_getter=custom_getter)
File "/home/sobey123/miniconda2/envs/dcgan/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 346, in get_variable
validate_shape=validate_shape)
File "/home/sobey123/miniconda2/envs/dcgan/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 331, in _true_getter
caching_device=caching_device, validate_shape=validate_shape)
File "/home/sobey123/miniconda2/envs/dcgan/lib/python3.6/site-packages/tensorflow/python/ops/variable_scope.py", line 637, in _get_single_variable
found_var.get_shape()))
ValueError: Trying to share variable discriminator/d_h4_lin/Matrix, but specified shape (8192, 1) and found shape (25088, 1).

I just run the conda env create -f dcgan.yml command, activate virtualenv and then python main.py --dataset market --options 1

It seems this line of code causes the problem:
model.py line 114
self.D_, self.D_logits_ = self.discriminator(self.G, self.y, reuse=True)

why 2 discriminator?
Many thanks in advance!

@qiaoguan
Copy link
Owner

qiaoguan commented Jun 7, 2018

hey, i alter the source code of main.py, just change the value of input_height and output_height to 128. and run the source code to see whether this problem can be solved?

@ShiinaMitsuki
Copy link
Author

Problem solved, thanks for helping!!
One more question, how long did it took for training the dcgan on market1501?
I'm now on epoch 300, but the sample images are still poor, my d_loss is small and g_loss trends to be growing with the epoch goes on.

I'm unfimilar with GAN, but according to the loss function proposed by the paper:

image

it seems tha g_loss should be small and d_loss should be big, I doubt that 300 epochs may far from enough.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants