We have added a couple of new functionalities to MIntO (see Newly supported functionalities below). And made it leaner, meaner, faster and more efficient.
We also fixed a lot of bugs since v2.2.0
. Outputs are still identical with v2.2.0
within 6-decimal precision (except taxonomic profiles that now include Unknown
by default), so these bugs were not producing wrong results. But v2.3.0
will give you a smoother experience.
Making a fresh installation?
Please see instructions in our Wiki page for installation
Upgrading from 2.0.0, 2.1.0 or 2.2.0?
Please do the following from where you cloned MIntO from github.
git pull
git checkout tags/2.3.0
After this, please rerun dependencies.smk
using the same command you ran previously. This should finish without any complaints. If it does not finish successfully, please submit an issue report on github.
Newly supported functionalities
- Cluster-friendly read-mapping with bwa-mem2. If the nodes in cluster are defined, MIntO will distribute bwa-index files to their local disk and use the local versions for mapping
- See instructions in
MIntO/smk/include/bwa_index_wrapper.smk
to see how it works - See instructions in
<minto_dir>/site/cluster.py
and define variables as necessary - Remember to pass
resources.qsub_args
to batch submission viaSnakemake
arguments. E.g., for slurm,--default-resources gpu=0 mem=4 "qsub_args=''" --cluster 'sbatch -J {name} --mem={resources.mem}G --gres=gpu:{resources.gpu} -c {threads} {resources.qsub_args}'
- Remember to delete the bwa-index files from the nodes when you are done by using
--config CLEAN_BWA_INDEX=True
arguments toSnakemake
- See Snakemake argument FAQ for help with setting up
Snakemake
commandline
- See instructions in
- Support of starting MIntO analysis half-way, if the input for that step is created properly (see FAQ for instructions)
- Automatic estimation of batch size based on avaliable-memory-per-task (
MAX_RAM_GB_PER_JOB
) for binning preparation step (mapping each sample against all assemblies)
Software improvements
- Improved efficiency and speed
- Gene profiling using
bam
andbed
files is orders of magnitude faster with identical output due to switchingbedtools multicov
-->samtools bedcov
coverm
behaves and stays within the threads it is provided, by limitingfastANI
to use only one thread- dbCAN annotation is much faster using
hmmsearch
viapyhmmer
- Distributing bwa-index files to local disks on clusters makes mapping significantly faster
- Gene profiling using
- Smaller disk footprint
- Not storing BAM files after mapping, thus decreasing the footprint of projects
- Ignoring unnecessary columns from contig-depth files as input to binning
- Improved code maintainability
- New config-parsing module makes much cleaner code
Changes
- Automatic estimation of memory requirement for several steps, thus removing several memory-related fields from
yaml
files - Unclassified taxa from MetaPhlAn and mOTUs are reported at the top of file as 'Unknown'
- Functional annotation in batches to handle studies with 1000s of MAGs
Bug-fixes
- Gene abundance and expression PCA plots labelled the points wrong. Please check if you have used this before
- Ensuring the
sample_alias
values are unique
Software and database version upgrades
- MetaPhlAn updated to v4.1.1
- mOTUs updated to v3.1.0
What's Changed
- Update gene_annotation.smk by @CJREID in #57
- Handle unassigned/unclassified taxa in metaphlan and motus by @microbiomix in #58
- Bugfix: eggnog db version by @jszarvas in #60
- Gene annotation in batches; Made it easier to skip QC and start MIntO with assembly or mapping to refgenome by @microbiomix in #61
- BUGFIX: Gene/function abundance/expression plots mislabels by @microbiomix in #62
- Fixed issues with 'MERGE_ILLUMINA_SAMPLES' directive; and other minor improvements by @microbiomix in #63
- Updated example yaml files; Improved search for runs per sample. by @microbiomix in #64
- Cluster-friendly implementation of bwa index to automatically distribute files to nodes by @microbiomix in #65
- Moved taxonomical annotation versions into gene_annotation by @jszarvas in #66
- Parallelized the bottleneck step of 'bedtools multicov' by @microbiomix in #67
- Using checkpoints to create assembly batches for depth calculation by @microbiomix in #68
- BWA index files are made as regular shadow rule and rsync'ed to nodes by @microbiomix in #69
- Replaced 'bedtools multicov' with 10X faster 'samtools bedcov' by @microbiomix in #70
- Saving space during binning - by gzipping files and ignoring useless columns by @microbiomix in #71
- Avoiding too many zcats and gzips by @microbiomix in #72
- Removed 'this.path' and 'include' dependencies by @arumugamlab in #73
- Speed-ups, additional input method, change in batches for binning and added raw read QC by @jszarvas in #74
- Cleanup of bwa index mirrors and automatic estimation of vamb memory usage by @microbiomix in #75
- Config parser module and other improvements by @microbiomix in #77
- Minor changes and bugfixes by @jszarvas in #78
New Contributors
- @CJREID made their first contribution in #57
- @arumugamlab made their first contribution in #73
Full Changelog: 2.2.0...2.3.0