Skip to content

Commit

Permalink
Merge pull request #30 from gbouras13/dev
Browse files Browse the repository at this point in the history
v0.1.3
  • Loading branch information
gbouras13 authored Mar 19, 2024
2 parents cd064ba + aa7cd41 commit 1766024
Show file tree
Hide file tree
Showing 11 changed files with 109 additions and 32 deletions.
6 changes: 6 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# History

0.1.3 (2024-03-19)
------------------

* Adds compability with Apple Silicon (M1/M2/M3) GPUs
* Fixes memory issue for `phold plot` with many contigs

0.1.2 (2024-03-06)
------------------

Expand Down
11 changes: 10 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,12 +60,21 @@ mamba create -n pholdENV -c conda-forge -c bioconda phold

To utilise `phold` with GPU, a GPU compatible version of `pytorch` must be installed. By default conda/mamba will install a CPU-only version.

Therefore, if you have an NVIDIA GPU, please try:
If you have an NVIDIA GPU, please try:

```bash
mamba create -n pholdENV -c conda-forge -c bioconda phold pytorch=*=cuda*
```

If you have a Mac running an Apple Silicon chip (M1/M2/M3), `phold` should be able to use the GPU. Please try:

```bash
mamba create -n pholdENV python==3.11
conda activate pholdENV
mamba install pytorch::pytorch torchvision torchaudio -c pytorch
mamba install -c conda-forge -c bioconda phold
```

If you are having trouble with `pytorch` see [this link](https://pytorch.org) for more instructions. If you have an older version of CUDA installed, then you might find [this link useful](https://pytorch.org/get-started/previous-versions/).

Once `phold` is installed, to download and install the database run:
Expand Down
15 changes: 12 additions & 3 deletions docs/install.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,17 @@ cd phold
pip install -e .
```

## Mac (M1/M2/M3)

If you have a Mac that runs Apple Silicon (M1/M2/M3), please try:

```bash
mamba create -n pholdENV python==3.11
conda activate pholdENV
mamba install pytorch::pytorch torchvision torchaudio -c pytorch
mamba install -c conda-forge -c bioconda phold
```

## Torch

To utilise `phold` with GPU, a GPU compatible version of `pytorch` must be installed.
Expand All @@ -46,7 +57,7 @@ If it is not automatically installed via the installation methods above, please

If you have an older version of the CUDA driver installed on your NVIDIA GPU, then you might find [this link useful](https://pytorch.org/get-started/previous-versions/).

Phold has been tested on NVIDIA GPUs (A100, RTX4090) and AMD GPUs (Radeon).
Phold has been tested on NVIDIA (A100, RTX4090), AMD (MI250) and Mac (M1 Pro) GPUs.

Installation on AMD GPUs requires a version of `torch` compatible with rocm e.g.

Expand Down Expand Up @@ -92,11 +103,9 @@ conda config --add channels conda-forge

We would recommend installing `phold` into a fresh environment. Assuming you installed miniforge, to create an environment called `pholdENV` with `phold` installed (assuming you have an NVIDIA GPU):


```bash
mamba create -n pholdENV -c conda-forge -c bioconda phold pytorch=*=cuda*
```

If you don't have a GPU:

```bash
Expand Down
2 changes: 1 addition & 1 deletion pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ requires = ["setuptools>=61.0", "wheel>=0.37.1"]
[project]
# https://packaging.python.org/en/latest/specifications/declaring-project-metadata/
name = "phold"
version = "0.1.2" # change VERSION too
version = "0.1.3" # change VERSION too
description = "Phage Annotations using Protein Structures"
readme = "README.md"
requires-python = ">=3.8, <3.12"
Expand Down
2 changes: 1 addition & 1 deletion src/phold/features/create_foldseek_db.py
Original file line number Diff line number Diff line change
Expand Up @@ -210,7 +210,7 @@ def generate_foldseek_db_from_pdbs(

if num_pdbs == 0:
logger.error(
f"No pdbs with matching CDS ids were found at all. Check the {pdb_dir}"
f"No pdbs with matching CDS ids were found at all. Check the {pdb_dir} directory"
)

# generate the db
Expand Down
28 changes: 16 additions & 12 deletions src/phold/features/predict_3Di.py
Original file line number Diff line number Diff line change
Expand Up @@ -96,20 +96,24 @@ def get_T5_model(

global device

if torch.cuda.is_available():
if cpu is True:
device = torch.device("cpu")
dev_name = "cpu"
else:
device = torch.device("cuda:0")
dev_name = "cuda:0"
else:
if cpu is True:
device = torch.device("cpu")
dev_name = "cpu"
if cpu is not True:
logger.warning("No available GPU was found, but --cpu was not specified")
logger.warning("ProstT5 will be run with CPU only")

else:
# check for NVIDIA/cuda
if torch.cuda.is_available():
device = torch.device("cuda:0")
dev_name = "cuda:0"
# check for apple silicon/metal
elif torch.backends.mps.is_available():
device = torch.device("mps")
dev_name = "mps"
else:
device = torch.device("cpu")
dev_name = "cpu"
if cpu is not True:
logger.warning("No available GPU was found, but --cpu was not specified")
logger.warning("ProstT5 will be run with CPU only")

# logger device only if the function is called
logger.info("Using device: {}".format(dev_name))
Expand Down
3 changes: 3 additions & 0 deletions src/phold/plot/plot.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
from pycirclize import Circos
from pycirclize.parser import Genbank
from matplotlib.lines import Line2D
import matplotlib.pyplot as plt
from matplotlib.patches import Patch
import numpy as np
from Bio import SeqUtils
Expand Down Expand Up @@ -625,3 +626,5 @@ def create_circos_plot(

# Save the image as an SVG
fig.savefig(svg_plot_file, format="svg", dpi=dpi)

plt.close(fig)
2 changes: 1 addition & 1 deletion src/phold/utils/VERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.1.2
0.1.3
1 change: 1 addition & 0 deletions tests/conftest.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,4 +6,5 @@
def pytest_addoption(parser):
parser.addoption("--gpu_available", action="store_true")
parser.addoption("--run_remote", action="store_true")
parser.addoption("--threads", action="store", default=1)

30 changes: 30 additions & 0 deletions tests/test_data/NC_043029_aa.fasta
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
>NC_043029_CDS_0001
MAVDRARFRMAVEGGAGGFSPLSPGEKGQRAAAEIGPGSNTGQKGQQDAIIDYLTIVVPLSALEEVNCKKLDLLLFRIFGFRGEVVAGAIREKNWNFYEQSAVLIDRENEVVGRVGIGGKKSTVCLSLTGMGCKWIRDWARVYKQCSMLDAKITRVDCAHDDYEGERLDVHALREVAAQGGFTEGGCPPRHRFISDEGHNTGCTLYVGGKGHKELCVYEKGKAEGLPSSRWVRAEVRLYGKHMEIPLDVLLNPGAYLRGSYSALQDLIKGVCTRLRTIRKHVEVSAEAMVLWMERQVGPALSVLRGAFGDSWSDFCEARIVRDGHPGRFRGIAKGDALHRFVREELCPSAA*
>NC_043029_CDS_0002
MPICRVKSAAVEERHNSKTNTINRSQTVGLDLGNGFELPFRVGLGSRPPYTPGEYDIDPQSFALSQYGDLVLKRYVDLVPLQAKAAAAPAKP*
>NC_043029_CDS_0003
MAVLIPACREADLDTAAGTCTAVIWIPQPALLPELPIEDAQAIGAKIALLWAVAYVFRLIRKKIEQS*
>NC_043029_CDS_0004
MHKMFNALKGKGAALAAVGTAALASAPAFASGGGGVDVGPVVTSINGALGPVGQIGAAVLLVLVGIKVYKWVRRAM*
>NC_043029_CDS_0005
MGAPRDVTATGGQGRLPPPGLLTPWIGQGAWDGRVDLAVRMARGLRDHLRGLQLMRWVARVFASAFIRRVAVLLVAALVGWCFSGRAHAAACASYTDQCTEGAAKQGALAWGGAQSKCVAVAGPNGRAGGNVSSKKSEGAGRGYFTVKAECLLNGNVVTYVEPAPPAEQGQWFYTQSCDAQPSYTGTGPWGSGGSAKNGSLGCRNGCDGIWQTNADASKTWTPLGNTCPDDEKKTCETYGDGYYWNSLLKVCEPPEGKCQGGGRPNSLGQCAPEPCPEGMAQQADGTCKKKDNECPAGQVRSPDGKCLPGDGQCAKGEVRGQDGTCKKDADNDGNPDPVNEDSFSGGDDCSAPPSCSGSPIMCGQARIQWRIDCNTRKNRNIAGGTCASMPICTGEKCDAMEYSALLMQWRSACALEKMAQGNGNGGGDNGDTKAIRDALTGTGGAVTTAPDRPSSDVWAPRSGTPVKPDTGGYGWGRTCPQPPSFEVFGNVIQINTAPLCNWLILGGYFVMGLAALASLRIIASRDA*
>NC_043029_CDS_0006
MPMLISTLLTALAALFRSKWGPWVAEAMVWLGLSWATNEFLVQPWIDQMEQAMRAGTPGGEFGALVIAYAGLMKFDVACTMIASAVTAKFAVGAAKTFLTKRA*
>NC_043029_CDS_0007
MPIELFTGQPGNGKTALMMERLVAEAKAASRPIFAAGIDGLDPGLATVLDDPRHWNNKDADGNYIVPDGSLIFVDEAWKWFGHLHDATRQQTPRHVLELAEHRHRGLDFVWTTQQPNQLYPFVRGLIGSHAHVVRRFGTKMLDVYRWGELNEEIKSLAKRDMAQRTTRLLPSQVFGQYKSAEVHTIKARIPFKVMLLPVLAIAAIVFAYLAYTSLRPSSFAGGEGKEGTQSASADAAPSPFRPAGAKEDAPRWPTAAAYAKDHLPRISTMPWTAPVFDERQARSDPQLVCMSSLEGLDAQGVRQEASCRCLTEQGTAYELSQPECRTLARNGPVYNPYRERSEERSTQRIEDLERSRPGVATTSAGGVAQHVERSMGTFPESPSYRSDSYMTTAPGPNKL*
>NC_043029_CDS_0008
MTSSARELLKWLAVILMTGDHVAKVIYGGYVPGLSEAGRVAFPLFALVMAYNLAQPGADVGKSVRRLALWGAIAQPVHALAFGYWLPLNILLTFGVCAAAVYAACQRNWIVLAFAAVVLPAFVDYQWAGVAFVLLAWLGFRTGRLLLTLVAFAPLCAFNGNLWALVAIPAALGLSHTAWSVPRGRWTFYGYYVAHLACLGLLAPILRP*
>NC_043029_CDS_0009
MERERPEYLQPIPRRRWEFPWLGMWAVLLLGGAGAGIWLHLKTGDAWNTRFMAAAETSDAAAPIEPSQADTDASRQVMIAEIRARRELAEIAAKRARAGRSDTPAHTDELRCINGIAFRRIPGGWENVPGAPCP*
>NC_043029_CDS_0010
MRDRKLTGPWAGFSFKGGRLVTPEGRELEPQDLAWLSLTAAQAQEWRRMMESSRAIDKPRNPLSFNAASVVNLSDALAQRRKKRSPGAMAGPDAEPPAAVLPVPGPKRRQRV*
>NC_043029_CDS_0011
MRSIDLLLDKAREKCERPSDRALAEKLRVTASAVSKWRKGGVITEMHATALAAIAGLDGEIVVRVMEEQAETPAQRRVWRSVLDRLSAAAAVLMLVVFAAPGAARAKAIDSQGSSGSDQPHSVYYVRIILGWLARLLPLPRHLLWHGA*
>NC_043029_CDS_0012
MIDPFIAFVLLAAIVAVSIGSAKLVSWCLDRRGESARRSAREAAIVAEACAELAATGWTAEDEASFQAIRGQQLVFLKHLQEVRHG*
>NC_043029_CDS_0013
MVKVLLFSAVLFGAVAILKDELYFAVVSALLGLLAYGFQAAEDRSNGR*
>NC_043029_CDS_0014
MAVDQFREFLRDPFVVSVLGGVLLTGLYWSLVLALRAKGAGNGR*
>NC_043029_CDS_0015
MAAECLVITKADWDQLMQLFAGMFLLLAFCAVFSPFDLHSWEYRVRRYLRRRRIARIRESV
41 changes: 28 additions & 13 deletions tests/test_integration.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,10 @@
# to run with remote and with gpu
pytest --run_remote --gpu_available .
# to run with 8 threads
pytest --run_remote --gpu_available --threads 8 .
"""

# import
Expand Down Expand Up @@ -46,18 +50,18 @@
remote_fasta_dir: Path = f"{output_dir}/combined_truncated_phold_remote_fasta"
proteins_predict_dir: Path = f"{output_dir}/combined_truncated_phold_proteins_predict"
proteins_compare_dir: Path = f"{output_dir}/combined_truncated_phold_proteins_compare"
proteins_compare_pdb_dir: Path = f"{output_dir}/NC_043029_phold_proteins_compare_pdb"
plots_dir: Path = f"{output_dir}/plot_output"


logger.add(lambda _: sys.exit(1), level="ERROR")
threads = 1
# threads = 1


def remove_directory(dir_path):
if os.path.exists(dir_path):
shutil.rmtree(dir_path)


@pytest.fixture(scope="session")
def gpu_available(pytestconfig):
return pytestconfig.getoption("gpu_available")
Expand All @@ -66,6 +70,11 @@ def gpu_available(pytestconfig):
def run_remote(pytestconfig):
return pytestconfig.getoption("run_remote")

@pytest.fixture(scope="session")
def threads(pytestconfig):
return pytestconfig.getoption("threads")



def exec_command(cmnd, stdout=subprocess.PIPE, stderr=subprocess.PIPE):
"""executes shell command and returns stdout if completes exit code 0
Expand All @@ -90,15 +99,15 @@ def test_install():
exec_command(cmd)


def test_run_genbank(gpu_available):
def test_run_genbank(gpu_available, threads):
"""test phold run with genbank input"""
input_gbk: Path = f"{test_data}/combined_truncated_acr_defense_vfdb_card.gbk"
cmd = f"phold run -i {input_gbk} -o {run_gbk_dir} -t {threads} -d {database_dir} -f"
if gpu_available is False:
cmd = f"{cmd} --cpu"
exec_command(cmd)

def test_run_fasta(gpu_available):
def test_run_fasta(gpu_available, threads):
"""test phold run with genbank input"""
input_fasta: Path = f"{test_data}/combined_truncated_acr_defense_vfdb_card.fasta"
cmd = f"phold run -i {input_fasta} -o {run_fasta_dir} -t {threads} -d {database_dir} -f"
Expand All @@ -107,7 +116,7 @@ def test_run_fasta(gpu_available):
exec_command(cmd)


def test_predict_genbank(gpu_available):
def test_predict_genbank(gpu_available, threads):
"""test phold predict with genbank input"""
input_gbk: Path = f"{test_data}/combined_truncated_acr_defense_vfdb_card.gbk"
cmd = f"phold predict -i {input_gbk} -o {predict_gbk_dir} -t {threads} -d {database_dir} -f"
Expand All @@ -116,21 +125,27 @@ def test_predict_genbank(gpu_available):
exec_command(cmd)


def test_compare_genbank():
def test_compare_genbank(threads):
"""test phold compare with genbank input"""
input_gbk: Path = f"{test_data}/combined_truncated_acr_defense_vfdb_card.gbk"
cmd = f"phold compare -i {input_gbk} -o {compare_gbk_dir} --predictions_dir {predict_gbk_dir} -t {threads} -d {database_dir} -f"
exec_command(cmd)


def test_compare_pdb():
def test_compare_pdb(threads):
"""test phold compare with pdbs input"""
input_gbk: Path = f"{test_data}/NC_043029.gbk"
cmd = f"phold compare -i {input_gbk} -o {compare_pdb_dir} -t {threads} -d {database_dir} --pdb --pdb_dir {pdb_dir} -f"
exec_command(cmd)

def test_proteins_compare_pdb(threads):
"""test phold proteins-compare with pdbs input"""
input_faa: Path = f"{test_data}/NC_043029_aa.fasta"
cmd = f"phold proteins-compare -i {input_faa} -o {proteins_compare_pdb_dir} -t {threads} -d {database_dir} --pdb --pdb_dir {pdb_dir} -f"
exec_command(cmd)


def test_predict_fasta(gpu_available):
def test_predict_fasta(gpu_available, threads):
"""test phold predict with fasta input"""
input_fasta: Path = f"{test_data}/combined_truncated_acr_defense_vfdb_card.fasta"
cmd = f"phold predict -i {input_fasta} -o {predict_fasta_dir} -t {threads} -d {database_dir} -f"
Expand All @@ -139,14 +154,14 @@ def test_predict_fasta(gpu_available):
exec_command(cmd)


def test_compare_fasta():
def test_compare_fasta(threads):
"""test phold compare with fasta input"""
input_fasta: Path = f"{test_data}/combined_truncated_acr_defense_vfdb_card.fasta"
cmd = f"phold compare -i {input_fasta} -o {compare_fasta_dir} --predictions_dir {predict_fasta_dir} -t {threads} -d {database_dir} -f"
exec_command(cmd)


def test_proteins_predict(gpu_available):
def test_proteins_predict(gpu_available, threads):
"""test phold proteins-predict"""
input_fasta: Path = f"{test_data}/phanotate.faa"
cmd = f"phold proteins-predict -i {input_fasta} -o {proteins_predict_dir} -t {threads} -d {database_dir} -f"
Expand All @@ -155,7 +170,7 @@ def test_proteins_predict(gpu_available):
exec_command(cmd)


def test_proteins_compare():
def test_proteins_compare(threads):
"""test phold proteins-compare"""
input_fasta: Path = f"{test_data}/phanotate.faa"
cmd = f"phold proteins-compare -i {input_fasta} --predictions_dir {proteins_predict_dir} -o {proteins_compare_dir} -t {threads} -d {database_dir} -f"
Expand All @@ -170,14 +185,14 @@ def test_plot():



def test_remote_genbank(run_remote):
def test_remote_genbank(run_remote, threads):
"""test phold remote with genbank input"""
input_gbk: Path = f"{test_data}/combined_truncated_acr_defense_vfdb_card.gbk"
if run_remote is True:
cmd = f"phold remote -i {input_gbk} -o {remote_gbk_dir} -t {threads} -d {database_dir} -f"
exec_command(cmd)

def test_remote_fasta(run_remote):
def test_remote_fasta(run_remote, threads):
"""test phold remote with fasta input"""
input_fasta: Path = (
f"{test_data}/combined_truncated_acr_defense_vfdb_card.fasta"
Expand Down

0 comments on commit 1766024

Please sign in to comment.