-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
phold memory usage #66
Comments
Hi @shiraz-shah , Perhaps you'd like to try out a few different values for George |
Thanks for this, George. Will give it a shot.
Do you have any idea about what is “normal” memory usage for phold?
Best
Shiraz A. Shah, MSc, PhD
Senior Researcher
Copenhagen Prospective Studies on Asthma in Childhood
Herlev and Gentofte Hospital, University of Copenhagen
www.copsac.com
…On Sat, 10 Aug 2024 at 15.20, George Bouras ***@***.***> wrote:
Hi @shiraz-shah <https://github.com/shiraz-shah> ,
Perhaps you'd like to try out a few different values for --batch_size?
e.g. --batch_size 100? I am not sure how that will impact the swap but it
should increase the GPU utilisation. It will depend on the length of your
CDS too, it should improve performance the shorter the CDS are.
George
—
Reply to this email directly, view it on GitHub
<#66 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AOJORBDDO4I7KP2IGAHYVA3ZQYHQXAVCNFSM6AAAAABMJW5RGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBRG4ZDKOJQHA>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Something worked. Everything is a lot faster now, but I'm not sure if it was the batch size. Something might have been odd about the input fasta file, as we didn't have the problem with other fasta files. We're trying to isolate the issue and will get back. |
Alright, George, in our case it seems that phold's memory footprint increases with the size of the input fna file. When we limit these files to max 500 contigs, the phold job fits fine within memory. Do you think this is a bug? Or does the user get better results when running more sequences together, instead of splitting them over several phold runs? |
Hi @shiraz-shah This is interesting. I have tested/run Phold on large GenBank files (1000s of contigs) and have not observed any issues. I haven't tested the specific numbers on the computation performance e.g. running 100x 100 contigs or 1x 10000 contigs, but broadly both Phold predict (prostT5) and compare (foldseek) are linear in terms of compute in my observations. However, perhaps the memory issue is caused by using FASTA input contigs. Your system is very similar to mine with an RTX4090. My question is - is the problem in the ProstT5 stage (I would assume yes)? And can you give me some more information of the benchmarking you did? The best thing might be a torch fix in the way ProstT5 is run. George |
It looked like it was at the ProtT5 stage now you mention it. I'll check and get back when I get the chance. |
Description
How much memory is phold supposed to use, and can you limit its memory use somehow?
I have a system with 8 cores, 64 gigs of RAM and an RTX 4090 with 24 gigs of video memory.
When running phold on predicted phage contigs all GPU and system memory is used along with 80 gigs of the swap file.
The system is constantly swapping. GPU utilisation is 0 percent and CPU use is at 3% of one CPU. Computation is a lot faster than on a CPU-only system, but I imagine it would be much faster if the system wasn't swapping constantly. Not to mention the strain on disks. Do you have any suggestions?
What I Did
It's not crashing. It's just using all available memory.
The text was updated successfully, but these errors were encountered: