phold memory usage #66

shiraz-shah · 2024-08-10T10:23:18Z

phold version: current git pull
Python version: 3.11
Operating System: Ubuntu 22.04

Description

How much memory is phold supposed to use, and can you limit its memory use somehow?

I have a system with 8 cores, 64 gigs of RAM and an RTX 4090 with 24 gigs of video memory.

When running phold on predicted phage contigs all GPU and system memory is used along with 80 gigs of the swap file.

The system is constantly swapping. GPU utilisation is 0 percent and CPU use is at 3% of one CPU. Computation is a lot faster than on a CPU-only system, but I imagine it would be much faster if the system wasn't swapping constantly. Not to mention the strain on disks. Do you have any suggestions?

What I Did

phold run -i contigs.6.fasta -o out.6 -t 12

It's not crashing. It's just using all available memory.

The text was updated successfully, but these errors were encountered:

gbouras13 · 2024-08-10T13:19:49Z

Hi @shiraz-shah ,

Perhaps you'd like to try out a few different values for --batch_size? e.g. --batch_size 100? I am not sure how that will impact the swap but it should increase the GPU utilisation. It will depend on the length of your CDS too, it should improve performance the shorter the CDS are.

George

shiraz-shah · 2024-08-11T10:03:40Z

Thanks for this, George. Will give it a shot. Do you have any idea about what is “normal” memory usage for phold? Best Shiraz A. Shah, MSc, PhD Senior Researcher C‌openhagen Prospective Studies on Asthma in Childhood H‌erlev and G‌entofte H‌ospital, U‌niversity of C‌openhagen www.copsac.com

…

On Sat, 10 Aug 2024 at 15.20, George Bouras ***@***.***> wrote: Hi @shiraz-shah <https://github.com/shiraz-shah> , Perhaps you'd like to try out a few different values for --batch_size? e.g. --batch_size 100? I am not sure how that will impact the swap but it should increase the GPU utilisation. It will depend on the length of your CDS too, it should improve performance the shorter the CDS are. George — Reply to this email directly, view it on GitHub <#66 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AOJORBDDO4I7KP2IGAHYVA3ZQYHQXAVCNFSM6AAAAABMJW5RGSVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBRG4ZDKOJQHA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

shiraz-shah · 2024-08-12T08:55:40Z

Something worked. Everything is a lot faster now, but I'm not sure if it was the batch size. Something might have been odd about the input fasta file, as we didn't have the problem with other fasta files. We're trying to isolate the issue and will get back.

shiraz-shah · 2024-08-13T07:32:09Z

Alright, George, in our case it seems that phold's memory footprint increases with the size of the input fna file. When we limit these files to max 500 contigs, the phold job fits fine within memory.

Do you think this is a bug? Or does the user get better results when running more sequences together, instead of splitting them over several phold runs?

gbouras13 · 2024-08-21T07:05:02Z

Hi @shiraz-shah

This is interesting. I have tested/run Phold on large GenBank files (1000s of contigs) and have not observed any issues. I haven't tested the specific numbers on the computation performance e.g. running 100x 100 contigs or 1x 10000 contigs, but broadly both Phold predict (prostT5) and compare (foldseek) are linear in terms of compute in my observations.

However, perhaps the memory issue is caused by using FASTA input contigs.

Your system is very similar to mine with an RTX4090.

My question is - is the problem in the ProstT5 stage (I would assume yes)? And can you give me some more information of the benchmarking you did? The best thing might be a torch fix in the way ProstT5 is run.

George

shiraz-shah · 2024-08-23T09:59:42Z

It looked like it was at the ProtT5 stage now you mention it. I'll check and get back when I get the chance.

gbouras13 added the question Further information is requested label Aug 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

phold memory usage #66

phold memory usage #66

shiraz-shah commented Aug 10, 2024

gbouras13 commented Aug 10, 2024

shiraz-shah commented Aug 11, 2024 via email

shiraz-shah commented Aug 12, 2024

shiraz-shah commented Aug 13, 2024

gbouras13 commented Aug 21, 2024

shiraz-shah commented Aug 23, 2024

phold memory usage #66

phold memory usage #66

Comments

shiraz-shah commented Aug 10, 2024

Description

What I Did

gbouras13 commented Aug 10, 2024

shiraz-shah commented Aug 11, 2024 via email

shiraz-shah commented Aug 12, 2024

shiraz-shah commented Aug 13, 2024

gbouras13 commented Aug 21, 2024

shiraz-shah commented Aug 23, 2024