Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phold memory usage #66

Open
shiraz-shah opened this issue Aug 10, 2024 · 6 comments
Open

phold memory usage #66

shiraz-shah opened this issue Aug 10, 2024 · 6 comments
Labels
question Further information is requested

Comments

@shiraz-shah
Copy link

  • phold version: current git pull
  • Python version: 3.11
  • Operating System: Ubuntu 22.04

Description

How much memory is phold supposed to use, and can you limit its memory use somehow?

I have a system with 8 cores, 64 gigs of RAM and an RTX 4090 with 24 gigs of video memory.

When running phold on predicted phage contigs all GPU and system memory is used along with 80 gigs of the swap file.

The system is constantly swapping. GPU utilisation is 0 percent and CPU use is at 3% of one CPU. Computation is a lot faster than on a CPU-only system, but I imagine it would be much faster if the system wasn't swapping constantly. Not to mention the strain on disks. Do you have any suggestions?

What I Did

phold run -i contigs.6.fasta -o out.6 -t 12

It's not crashing. It's just using all available memory.

@gbouras13
Copy link
Owner

Hi @shiraz-shah ,

Perhaps you'd like to try out a few different values for --batch_size? e.g. --batch_size 100? I am not sure how that will impact the swap but it should increase the GPU utilisation. It will depend on the length of your CDS too, it should improve performance the shorter the CDS are.

George

@shiraz-shah
Copy link
Author

shiraz-shah commented Aug 11, 2024 via email

@shiraz-shah
Copy link
Author

Something worked. Everything is a lot faster now, but I'm not sure if it was the batch size. Something might have been odd about the input fasta file, as we didn't have the problem with other fasta files. We're trying to isolate the issue and will get back.

@shiraz-shah
Copy link
Author

Alright, George, in our case it seems that phold's memory footprint increases with the size of the input fna file. When we limit these files to max 500 contigs, the phold job fits fine within memory.

Do you think this is a bug? Or does the user get better results when running more sequences together, instead of splitting them over several phold runs?

@gbouras13 gbouras13 added the question Further information is requested label Aug 21, 2024
@gbouras13
Copy link
Owner

Hi @shiraz-shah

This is interesting. I have tested/run Phold on large GenBank files (1000s of contigs) and have not observed any issues. I haven't tested the specific numbers on the computation performance e.g. running 100x 100 contigs or 1x 10000 contigs, but broadly both Phold predict (prostT5) and compare (foldseek) are linear in terms of compute in my observations.

However, perhaps the memory issue is caused by using FASTA input contigs.

Your system is very similar to mine with an RTX4090.

My question is - is the problem in the ProstT5 stage (I would assume yes)? And can you give me some more information of the benchmarking you did? The best thing might be a torch fix in the way ProstT5 is run.

George

@shiraz-shah
Copy link
Author

It looked like it was at the ProtT5 stage now you mention it. I'll check and get back when I get the chance.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants