Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-GPUS support #152

Open
wants to merge 7 commits into
base: master
Choose a base branch
from
Open

Multi-GPUS support #152

wants to merge 7 commits into from

Conversation

MlWoo
Copy link

@MlWoo MlWoo commented Aug 14, 2018

Many friends seem very to be interested in multi-gpus support when training the model. Maybe it is necessary to merge the branch into the master one.

@MlWoo
Copy link
Author

MlWoo commented Aug 14, 2018

@begeekmyfriend I have not modified the relative code in terms of the pattern.

@begeekmyfriend
Copy link
Contributor

begeekmyfriend commented Aug 14, 2018

@Rayhane-mamah Yes I agree. In multi-gpu mode we can set r=1 and expand the batch size to obtain smooth gradient. So please consider it as another branch.

@Rayhane-mamah
Copy link
Owner

Yes it seems like people are requesting that. :) well, your multi-gpu attempt @MlWoo is sure much helpful. Since the model content has been changed since you made this implementation, I will need to make few updates here and there, but yeah, I will probably make a new branch for both Wavenet and Tacotron multi-gpu or add those directly on master with optional use or something. (I don't like 4 spaces though hahaha..).

In the meantime, I am leaving this PR open in here so that people can quickly refer to a good multi-gpu implementation :)

Thanks for all your contributions @MlWoo and @begeekmyfriend ;)

@tomse-h
Copy link

tomse-h commented Sep 17, 2018

When I try to use this Fork as it is, I run into the following:

ValueError: Cannot feed value of shape (48, 408, 1025) for Tensor 'datafeeder/linear_targets:0', which has shape '(?, ?, 513)'

What could be the cause of this? I preprocessed LJSpeech with the given hyperparameters btw.

@MlWoo
Copy link
Author

MlWoo commented Sep 18, 2018

@tomse-h I have not modified the relative code in terms of the linear pattern. You can complete it with the solution of mel features

@shaktikshri
Copy link

shaktikshri commented Aug 14, 2020

I might be a bit late into this conversation, but did you guys also see a proportional increase in sec/step when using multiple GPUs? Here are my stats on V100 GPUs with outputs_per_step = 16
#GPU----batchsize----sec/step
1.................32......................~4
2.................64.....................~10
3.................96 ....................~15
4.................128....................~19

@MlWoo
Copy link
Author

MlWoo commented Aug 17, 2020

@shaktikshri No, it increases but does scale linearly. You would better check the time of loading data and the unbalance of length of data of each device.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants