Skip to content

Commit

Permalink
Merge pull request #89 from phac-nml/mob-3.0.3
Browse files Browse the repository at this point in the history
Merging branch `mob-3.0.3` to `master` for MOB-Suite v3.0.3 release
  • Loading branch information
kbessonov1984 authored Aug 5, 2021
2 parents 77a42bc + 49153b9 commit 1d735b3
Show file tree
Hide file tree
Showing 7 changed files with 76 additions and 22 deletions.
48 changes: 35 additions & 13 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -76,23 +76,45 @@ We recommend installing MOB-Suite via bioconda but you can install it via pip us
% pip3 install mob_suite
```

### Source
For system-wide installation one can follow these commands on Ubuntu distro that includes Python
library dependencies and tools
```bash
apt update && apt install python3-pip #installs gcc compiler for pycurl
apt install libcurl4-openssl-dev libssl-dev #for pycurl
pip3 install Cython
apt install mash ncbi-blast+
python3 setup.py install && mob_init #to install and init databases
```

### Docker image
A docker image is also available at [https://hub.docker.com/r/kbessonov/mob_suite](https://hub.docker.com/r/kbessonov/mob_suite)

```
% docker pull kbessonov/mob_suite:3.0.1
% docker run --rm -v $(pwd):/mnt/ "kbessonov/mob_suite:3.0.1" mob_recon -i /mnt/assembly.fasta -t -o /mnt/mob_recon_output
% docker pull kbessonov/mob_suite:3.0.3
% docker run --rm -v $(pwd):/mnt/ "kbessonov/mob_suite:3.0.3" mob_recon -i /mnt/assembly.fasta -t -o /mnt/mob_recon_output
```

### Singularity image
A singularity image could be built via singularity recipe donated by Eric Deveaud.
The recipe (`recipe.singularity`) is located in the singularity folder of this repository.
The docker image [README section](https://hub.docker.com/repository/docker/kbessonov/mob_suite) also has instructions on how to create singularity image from a docker image.
A singularity image could be built locally via Singularity recipe donated by Eric Deveaud.
The recipe (`recipe.singularity`) is located in the `singularity` folder of this repository and installs MOB-Suite via `conda`.

```bash
% singularity build mobsuite.simg recipe.singularity
```

In addition, Singularity currently supports docker images and automatically converts them to Singularity images format.
```bash
% singularity pull docker://kbessonov/mob_suite:3.0.3
```

Alternatively, Singularity image can be pulled from [BioContainers repository](https://biocontainers.pro/tools/mob_suite) where `<version>` is
the desired version (e.g. `3.0.3--py_0`)

```bash
% singularity run https://depot.galaxyproject.org/singularity/mob_suite:<version>
```

## Using MOB-typer to perform replicon and relaxase typing of complete plasmids and to predict mobility and replicative plasmid host-range

### Setuptools
Expand All @@ -106,7 +128,7 @@ Clone this repository and install via setuptools.

## Using MOB-typer to perform replicon and relaxase typing of complete plasmids and predict mobility

You can perform plasmid typing using a fasta formated file containing a single plasmid represented by one or more contigs or it can treat all of the sequences in the fasta file as independant. The default behaviour is to treat all sequences in a file as from one plasmid, do not include multiple unrelated plasmids in the file without specifying --multi as they will be treated as a single plasmid.
You can perform plasmid typing using a fasta formated file containing a single plasmid represented by one or more contigs or it can treat all of the sequences in the fasta file as independent. The default behaviour is to treat all sequences in a file as from one plasmid, so do not include multiple unrelated plasmids in the file without specifying --multi as they will be treated as a single plasmid.


```
Expand All @@ -126,7 +148,7 @@ unicycler is used, then the circularity information can be parsed directly from
% mob_recon --infile assembly.fasta --outdir my_out_dir
```

As of v. 3.0.0, we have added the ability of users to provide their own specific set of sequences to remove from plasmid reconstruction. This should be performed with caution and with the knowlede of your organism. Sequences which are frequently of plasmid origin but are not in your organism is the primary use case we envision for this feature.
As of v. 3.0.0, we have added the ability of users to provide their own specific set of sequences to remove from plasmid reconstruction. This should be performed with caution and with the knowledge of your organism. Filtering of sequences which are frequently of plasmid origin but are not in your organism is the primary use case we envision for this feature.

```
### User sequence mask
Expand All @@ -135,14 +157,14 @@ As of v. 3.0.0, we have added the ability of users to provide their own specific

As of v. 3.0.0, we have provided the ability to use a collection of closed genomes which will be quickly checked using Mash for genomes which are genetically close and limit blast searches to those chromosomes. This more nuanced and automatic approach is recommended for users where there are sequences which should be filtered in one genomic context but not another. We provide as an optional download as set of closed Enterobacteriacea genomes from NCBI which can be used to provide added accuracy for some organisms such as E. coli and Klebsiella where there are sequences which switch between chromosome and plasmids.
<br><br>
If reconstructed plasmids exceed the Mash distance for primary cluster assignment, then they will get assigned a name in the format novel_{md5} where the md5 hash is calculated based on all of the sequences belonging to that reconstructed plasmid. This will provide a unique name for them but any change will result in a changed in the md5 hash. It is inadvised to use these groups for further analyses. Rather they should be highlighted as cases where targeted long read sequencing is required to obtain a closer database representitive of that plasmid.
If reconstructed plasmids exceed the Mash distance for primary cluster assignment, then they will be assigned a name in the format novel_{md5} where the md5 hash is calculated based on all of the sequences belonging to that reconstructed plasmid. This will provide a unique name for the plasmids but any change will result in a corresponding change in the md5 hash. It is therefore not advised to use these assigned names for further analyses. Rather they should be highlighted as cases where targeted long read sequencing is required to obtain a closer database representative of that plasmid.

```
### Autodetected close genome filter
% mob_recon --infile assembly.fasta --outdir my_out_dir -g 2019-11-NCBI-Enterobacteriacea-Chromosomes.fasta
```
## Using MOB-cluster
Use this tool only to update the plasmid databases or build a new one and should only be completed with closed high quality plasmids. If you add in poor quality data it can severely impact MOB-recon. As od v. 3.0.0, MOB-cluster has been re-written to utilize the output from MOB-typer to greatly speed up the process of updating and builing plasmid databases by using pre-computed results. Clusters generated from earlier versions of MOB-suite are not compatibile with the new clusters. We have povided a mapping file of previous cluster assignments and their new cluster accessions. Each cluster code is unique and will not be re-used.
Use this tool only to update the plasmid databases or build a new one, however MOB-cluster should only be run with closed high quality plasmids. If you add in poor quality data it can severely impact MOB-recon. As of v3.0.0, MOB-cluster has been re-written to utilize the output from MOB-typer to greatly speed up the process of updating and building plasmid databases by using pre-computed results. Clusters generated from earlier versions of MOB-suite are not compatible with the new clusters. We have provided a mapping file of previous cluster assignments and their new cluster accessions. Each cluster code is unique and will not be re-used.

```
### Build a new database
Expand Down Expand Up @@ -177,7 +199,7 @@ Use this tool only to update the plasmid databases or build a new one and should
# MOB-recon contig report format
| field | Description |
| --------- | --------- |
| sample_id | Sample ID specified by user or deault to filename |
| sample_id | Sample ID specified by user or default to filename |
| molecule_type | Plasmid or Chromosome |
| primary_cluster_id | primary MOB-cluster id of neighbor |
| secondary_cluster_id | secondary MOB-cluster id of neighbor |
Expand Down Expand Up @@ -205,12 +227,12 @@ Use this tool only to update the plasmid databases or build a new one and should
# MOB-typer report file format
| field | Description |
| --------- | --------- |
| sample_id | Sample ID specified by user or deault to filename |
| sample_id | Sample ID specified by user or default to filename |
| num_contigs | Number of sequences belonging to plasmid |
| size | Length in base pairs |
| gc | GC % |
| md5 | md5 hash |
| rep_type(s) | Replion type(s) |
| rep_type(s) | Replicon type(s) |
| rep_type_accession(s) | Replicon sequence accession(s) |
| relaxase_type(s) | Relaxase type(s) |
| relaxase_type_accession(s) | Relaxase sequence accession(s) |
Expand All @@ -235,7 +257,7 @@ Use this tool only to update the plasmid databases or build a new one and should
# MOB-cluster sequence cluster information file
| field | Description |
| --------- | --------- |
| sample_id | Sample ID specified by user or deault to filename |
| sample_id | Sample ID specified by user or default to filename |
| size | Length in base pairs |
| gc | GC % |
| md5 | md5 hash |
Expand Down
2 changes: 1 addition & 1 deletion mob_suite/conda/meta.yaml
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
{% set version = "3.0.1" %}
{% set version = "3.0.3" %}

package:
name: mob_suite
Expand Down
11 changes: 11 additions & 0 deletions mob_suite/docker/Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
FROM ubuntu:21.04
RUN ln -fs /usr/share/zoneinfo/America/New_York /etc/localtime
RUN apt update && apt install git python3-pip -y
RUN git clone https://github.com/phac-nml/mob-suite.git
RUN cd mob-suite && git checkout mob-3.0.3 && cd ..
RUN apt install libcurl4-openssl-dev libssl-dev -y
RUN pip3 install Cython numpy
RUN apt install mash ncbi-blast+ -y
RUN cd mob-suite && python3 setup.py install && cd .. && rm -rf mob-suite
RUN mob_init
RUN apt clean
6 changes: 4 additions & 2 deletions mob_suite/mob_init.py
Original file line number Diff line number Diff line change
Expand Up @@ -114,7 +114,7 @@ def extract(fname, outdir):
for file_name in src_files:
full_file_name = os.path.join(dir_name, file_name)
if os.path.isfile(full_file_name):
shutil.copy(full_file_name, outdir)
shutil.copyfile(full_file_name, os.path.join(outdir,file_name))
shutil.rmtree(dir_name)
os.remove(fname)

Expand Down Expand Up @@ -143,7 +143,7 @@ def main():
except Exception as e:
logger.error("Failed to place a lock file at {}. Database diretory can not be accessed. Wrong path?".format(lockfilepath))
logger.error("{}".format(e))
exit(-1)
pass
else:
while os.path.exists(lockfilepath):
elapsed_time = time.time() - os.path.getmtime(lockfilepath)
Expand Down Expand Up @@ -245,6 +245,8 @@ def main():
except:
logger.warning("Lock file is already removed by some other process.")
pass


logger.info("MOB init completed successfully")
return 0

Expand Down
27 changes: 23 additions & 4 deletions mob_suite/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -411,7 +411,19 @@ def initETE3Database(database_directory, ETE3DBTAXAFILE, logging):
logging.info("ETE3 database init completed successfully.")



def ETE3_db_status_check(taxid, lockfilepath, ETE3DBTAXAFILE, logging):
"""
Place a lock file while using ETE3 taxonomy database (taxa.sqlite) to prevent accidental concurrent multiprocess update
Parameters:
taxid - the taxonomy id which is 1 by default for database health testing
lockfilepath - path to the database lock file
ETE3DBTAXAFILE - path to ETE3 taxa.sqlite file
logging - logger object for logging messages
Returns:
Bool: True/False value with regards to database usage.
If .lock file is not removed after 10 min, program exits
"""
max_time = 600
elapsed_time = 0

Expand All @@ -436,7 +448,13 @@ def ETE3_db_status_check(taxid, lockfilepath, ETE3DBTAXAFILE, logging):

else:
logging.info("Creating Lock file {}".format(lockfilepath))
open(file=lockfilepath, mode="w").close()

#some file systems are read-only which will not support lock file writting
try:
open(file=lockfilepath, mode="w").close()
except Exception as e:
logging.info(e)
pass

logging.info("Testing ETE3 taxonomy db {}".format(ETE3DBTAXAFILE))
ncbi = NCBITaxa(dbfile=ETE3DBTAXAFILE)
Expand All @@ -446,8 +464,9 @@ def ETE3_db_status_check(taxid, lockfilepath, ETE3DBTAXAFILE, logging):
try:
os.remove(lockfilepath)
logging.info("Lock file removed.")
except:
logging.warning("Lock file is already removed by some other process.")
except Exception as e:
logging.warning("Lock file is already removed by some other process or read-only file system")
logging.warning(e)

if len(lineage) > 0:
return True
Expand Down Expand Up @@ -643,7 +662,7 @@ def verify_init(logger, database_dir):
status_file = os.path.join(database_dir, 'status.txt')
if not os.path.isfile(status_file):
logger.info('MOB-databases need to be initialized, this will take some time')
p = Popen(['python', mob_init_path, '-d', database_dir],
p = Popen([sys.executable, mob_init_path, '-d', database_dir],
stdout=PIPE,
stderr=PIPE,
shell=False)
Expand Down
2 changes: 1 addition & 1 deletion mob_suite/version.py
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
__version__ = '3.0.2'
__version__ = '3.0.3'

2 changes: 1 addition & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@ def read(fname):
setup(
name='mob_suite',
include_package_data=True,
version='3.0.1',
version='3.0.3',
python_requires='>=3.7.0,<4',
setup_requires=['pytest-runner'],
tests_require=['pytest'],
Expand Down

0 comments on commit 1d735b3

Please sign in to comment.