Skip to content

MIntO 2.3.0 stable release

Latest
Compare
Choose a tag to compare
@microbiomix microbiomix released this 19 Dec 15:56
1c3748e

We have added a couple of new functionalities to MIntO (see Newly supported functionalities below). And made it leaner, meaner, faster and more efficient.

We also fixed a lot of bugs since v2.2.0. Outputs are still identical with v2.2.0 within 6-decimal precision (except taxonomic profiles that now include Unknown by default), so these bugs were not producing wrong results. But v2.3.0 will give you a smoother experience.

Making a fresh installation?

Please see instructions in our Wiki page for installation

Upgrading from 2.0.0, 2.1.0 or 2.2.0?

Please do the following from where you cloned MIntO from github.

git pull
git checkout tags/2.3.0

After this, please rerun dependencies.smk using the same command you ran previously. This should finish without any complaints. If it does not finish successfully, please submit an issue report on github.

Newly supported functionalities

  • Cluster-friendly read-mapping with bwa-mem2. If the nodes in cluster are defined, MIntO will distribute bwa-index files to their local disk and use the local versions for mapping
    • See instructions in MIntO/smk/include/bwa_index_wrapper.smk to see how it works
    • See instructions in <minto_dir>/site/cluster.py and define variables as necessary
    • Remember to pass resources.qsub_args to batch submission via Snakemake arguments. E.g., for slurm, --default-resources gpu=0 mem=4 "qsub_args=''" --cluster 'sbatch -J {name} --mem={resources.mem}G --gres=gpu:{resources.gpu} -c {threads} {resources.qsub_args}'
    • Remember to delete the bwa-index files from the nodes when you are done by using --config CLEAN_BWA_INDEX=True arguments to Snakemake
    • See Snakemake argument FAQ for help with setting up Snakemake commandline
  • Support of starting MIntO analysis half-way, if the input for that step is created properly (see FAQ for instructions)
  • Automatic estimation of batch size based on avaliable-memory-per-task (MAX_RAM_GB_PER_JOB) for binning preparation step (mapping each sample against all assemblies)

Software improvements

  • Improved efficiency and speed
    • Gene profiling using bam and bed files is orders of magnitude faster with identical output due to switching bedtools multicov --> samtools bedcov
    • coverm behaves and stays within the threads it is provided, by limiting fastANI to use only one thread
    • dbCAN annotation is much faster using hmmsearch via pyhmmer
    • Distributing bwa-index files to local disks on clusters makes mapping significantly faster
  • Smaller disk footprint
    • Not storing BAM files after mapping, thus decreasing the footprint of projects
    • Ignoring unnecessary columns from contig-depth files as input to binning
  • Improved code maintainability
    • New config-parsing module makes much cleaner code

Changes

  • Automatic estimation of memory requirement for several steps, thus removing several memory-related fields from yaml files
  • Unclassified taxa from MetaPhlAn and mOTUs are reported at the top of file as 'Unknown'
  • Functional annotation in batches to handle studies with 1000s of MAGs

Bug-fixes

  • Gene abundance and expression PCA plots labelled the points wrong. Please check if you have used this before
  • Ensuring the sample_alias values are unique

Software and database version upgrades

  • MetaPhlAn updated to v4.1.1
  • mOTUs updated to v3.1.0

What's Changed

  • Update gene_annotation.smk by @CJREID in #57
  • Handle unassigned/unclassified taxa in metaphlan and motus by @microbiomix in #58
  • Bugfix: eggnog db version by @jszarvas in #60
  • Gene annotation in batches; Made it easier to skip QC and start MIntO with assembly or mapping to refgenome by @microbiomix in #61
  • BUGFIX: Gene/function abundance/expression plots mislabels by @microbiomix in #62
  • Fixed issues with 'MERGE_ILLUMINA_SAMPLES' directive; and other minor improvements by @microbiomix in #63
  • Updated example yaml files; Improved search for runs per sample. by @microbiomix in #64
  • Cluster-friendly implementation of bwa index to automatically distribute files to nodes by @microbiomix in #65
  • Moved taxonomical annotation versions into gene_annotation by @jszarvas in #66
  • Parallelized the bottleneck step of 'bedtools multicov' by @microbiomix in #67
  • Using checkpoints to create assembly batches for depth calculation by @microbiomix in #68
  • BWA index files are made as regular shadow rule and rsync'ed to nodes by @microbiomix in #69
  • Replaced 'bedtools multicov' with 10X faster 'samtools bedcov' by @microbiomix in #70
  • Saving space during binning - by gzipping files and ignoring useless columns by @microbiomix in #71
  • Avoiding too many zcats and gzips by @microbiomix in #72
  • Removed 'this.path' and 'include' dependencies by @arumugamlab in #73
  • Speed-ups, additional input method, change in batches for binning and added raw read QC by @jszarvas in #74
  • Cleanup of bwa index mirrors and automatic estimation of vamb memory usage by @microbiomix in #75
  • Config parser module and other improvements by @microbiomix in #77
  • Minor changes and bugfixes by @jszarvas in #78

New Contributors

Full Changelog: 2.2.0...2.3.0