You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thank you for developing this useful tool.
I have so many genomes (> 100k) along with their gbk files, and I want to annotate them via Phold. Would a large --batch_size (e.g. 128) help process files faster? Since in the documentation, you have mentioned that a batch size of 1 is usually faster!
And, in general, should I combine all of my gbk files into a single one as input, or can I give different gbk files in parallel to the phold predict?
bw
The text was updated successfully, but these errors were encountered:
In terms of the --batch_size, I found using a batch size of 1 was fasted on my hardware (RTX4090) but it should really be more efficient with larger batch sizes. I am finalising a 'production release' of Phold now so I will look into it.
In terms of the gbk input - I would say you should run them in chunks (of e.g. 1000/5000) which I have done in the past. Not sure if you are running this on a cluster environment, but it would allow you to distribute to multiple GPUs as well. I have found it to be most efficient for cluster environments and also just generally more robust (running 100k genomes will take hours/days and if there is some error, you will lose the intermediate steps). You can't run different gbks in parallel (as it uses a single GPU).
Hi, thank you for developing this useful tool.
I have so many genomes (> 100k) along with their
gbk
files, and I want to annotate them via Phold. Would a large--batch_size
(e.g. 128) help process files faster? Since in the documentation, you have mentioned that a batch size of 1 is usually faster!And, in general, should I combine all of my
gbk
files into a single one as input, or can I give differentgbk
files in parallel to thephold predict
?bw
The text was updated successfully, but these errors were encountered: