Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hyperparameter Tuning Strategies #2

Open
shrutijpalaskar opened this issue Jul 26, 2021 · 1 comment
Open

Hyperparameter Tuning Strategies #2

shrutijpalaskar opened this issue Jul 26, 2021 · 1 comment

Comments

@shrutijpalaskar
Copy link

Hi Jaemin,

Thanks for the very interesting paper and releasing your codebase!

I have been working with your codebase for a different multimodal text generation task and observe lower performance with VL-T5 and VL-BART than other similar models. I think this might be a hyperparameter tuning issue. Do you have any advice on which particular parameters might be beneficial to tune? I am currently following the Multi30K settings for the learning rate and number of epochs from Table 14 in your paper.

@j-min
Copy link
Owner

j-min commented Jul 26, 2021

Hi @shrutijpalaskar. Since I had to run all pretraining/finetuning experiments on a 4 x 10GB RTX 2080 ti server (much smaller compared to recent works from big companies), I couldn't try a wide hyperparameter search, which means the current hyperparameters are under-tuned and might be far from optimal. I guess VL-T5/VL-BART model could achieve higher scores on benchmarks with better hyperparameters.
In my experiments, I didn't observe much difference when tuning parameters (ex. batch size, learning rate, epochs) during finetuning. I found improvements when using longer pretraining epochs (10epochs -> 30epochs; I didn't have time to explore longer) and bigger backbone architectures (ex. t5-small -> t5-base), which are kinda obvious.
What is your target multimodal text generation task?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants