Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarification on Number of Training Steps #65

Open
alberthli opened this issue Jan 2, 2025 · 6 comments
Open

Clarification on Number of Training Steps #65

alberthli opened this issue Jan 2, 2025 · 6 comments

Comments

@alberthli
Copy link

In the paper and in the config comment, you state that you train the model for 1M steps each for the discriminator and generator. However, the config itself uses

max_steps: 20000000  # 20M, not 2M!

Could you clarify which of these numbers is a typo?

@jishengpeng
Copy link
Owner

In the paper and in the config comment, you state that you train the model for 1M steps each for the discriminator and generator. However, the config itself uses

max_steps: 20000000  # 20M, not 2M!

Could you clarify which of these numbers is a typo?

Thank you for your attention. We set an upper limit of 2 million steps for training. And in practice, the training process is often terminated earlier based on observations from TensorBoard.

@alberthli
Copy link
Author

So you mean to say that the number 20000000 (20 million, not 2 million) in the config is always set, but you terminate early, usually around 2 million steps instead? And do you just manually terminate the run rather than using some automated condition?

@jishengpeng
Copy link
Owner

So you mean to say that the number 20000000 (20 million, not 2 million) in the config is always set, but you terminate early, usually around 2 million steps instead? And do you just manually terminate the run rather than using some automated condition?

2 million, not 20 million.

@alberthli
Copy link
Author

I'm not sure I understand. This number in your config is not 2 million, it is 20 million (there are 7, not 6 zeros). Are you saying that the number specified in the paper and the comments is wrong, or that the config is wrong? These two numbers are not consistent with each other.

@jishengpeng
Copy link
Owner

I'm not sure I understand. This number in your config is not 2 million, it is 20 million (there are 7, not 6 zeros). Are you saying that the number specified in the paper and the comments is wrong, or that the config is wrong? These two numbers are not consistent with each other.

We have further updated the config for the better understanding. Thank you.

@alberthli
Copy link
Author

Thank you for updating the config - did you use 2M or 20M when training the models presented in the paper? This affects things like the learning rate scheduler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants