Skip to content

Releases: B-UMMI/INNUca

INNUca v4.2.2 - SPAdes v3.14.0

19 Feb 10:08
ee1f1dc
Compare
Choose a tag to compare
  • Change SPAdes version
    • Add SPAdes v3.14.0 and remove v3.10.1
    • Incorporate SPAdes --isolate option for estimated coverage >= 100x
  • Change MLST QA/QC
    • Samples with species known MLST scheme but for which it was not possible to find a scheme will now raise a warning instead of fail
  • Add more statistics
    • Save total number of reads and bp sequenced
  • Change Docker image
    • Change base image to perl:5.30-slim-stretch. This allows to use most recent Perl version but keeping an old Linux distribution for old kernels compatibility.
    • Install procps to provide free package to access memory usage
    • Do a JDK headless installation
    • Add any2fasta (mlst dependency)
  • Minor changes
    • Add Docker image statistics to README
  • Minor fixes
    • Correct mlst installation
    • Check if the mlst novel alleles file exists before cleaning it
    • Catch subprocess error when program to run is not installed

v4.2.1

27 Nov 12:33
cc2b6f5
Compare
Choose a tag to compare

Change MLST QA/QC:

  • Samples with known MLST scheme but for which it was not possible to find a scheme will now raise a warning instead of fail

Saves the total number of reads and bp sequenced

Update mlst version

  • Change base image of INNUca's Docker image (to perl:5.30, a Debian 9 image) for Perl and old Kernel version compatibilities
  • Install and update mlst dependencies
  • Install Java JDK headless but provide extra font loading capabilities (mainly for FastQC, but also Trimmomatic and Pilon)

Minor changes:

  • Check if the mlst novel alleles file exists before trying to cleaning it
  • Catches subprocess error when the program to run is not installed

Kraken

25 Sep 12:51
2ef1ddf
Compare
Choose a tag to compare
Kraken Pre-release
Pre-release

Add Kraken analysis module
Add Kraken to INNUca Docker image
Add more options to control pipeline flow and QA/QC assessment
Add option to keep bam files
Only use Nextera_XT_INNUca adapters file
Extend --help information
Convert utils.py to python3
Update README
Remove temporary "reads" folder when using --fastq option
Minor fixes

v3.2

08 Mar 11:06
57c2afc
Compare
Choose a tag to compare

Add Nextera adapters
Ignore samples with more than 2 fastq files
Uniformization of warnings report
Make trueCoverage executable
Add trueCoverage references sequences for other species
Fix minor bugs
Add warnings to Trimmomatic when no reads survived the cleaning step
PASS, FAIL, WARNING rules (FAIL overrides WARNING):

  • FAIL
    • Low estimated coverage calculated in EstimatedCoverage module (number of sequenced nucleotides / expected genome size) (default 15x). STOPs sample running.
    • Lower sample coverage, higher number of absent genes or higher number of genes with multiple alleles than specified in TrueCoverage module config file. STOPs sample running.
    • Fail FastQC “Per base sequence quality”, “Overrepresented sequences”, “Per sequence GC content” or “Sequence length distribution”. Do not pass FastQC “Per base N content” or “Adapter Content”. STOPs sample running if sample FastqQC fails after Trimmomatic reads cleaning.
    • AssemblyMapping module does not run successfully.
    • Assembly coverage (calculated in AssemblyMapping module) of filtered contigs does not reach the minimum required (30x).
    • MLST scheme found does not match with provided species (mlst module) (with the exception of Yersinia genus, which might raise a warning).
  • WARNING
    • Fail FastQC “Per base sequence content”. Do not pass FastQC “Per base sequence quality” or “Overrepresented sequences”.
    • Zero read pairs survive Trimmomatic cleaning.
    • Higher number of contigs than allowed or odd number of assembled nucleotides.
    • Less than 95% of the reads mapped back to the assembly (in AssemblyMapping module).
    • mlst module did not run.
    • Found MLST scheme for a species with unknown scheme. In case of Yersinia genus, only raises a warning if the specific scheme found does not match with the scheme for provided species name (but matches the genus).

v3.1.1

18 Sep 12:54
Compare
Choose a tag to compare

Correct the calculation of maximum number of contigs
Change assembly filtering step: try filtering contigs based on SPAdes K-mer coverage, contigs length and GC content, and if it fails filter only based on contigs length and GC content (to try to rescue the sample from failing)
Change PASS, FAIL, WARNING rules (FAIL overrides WARNING):

  • FAIL
    • Low estimated coverage calculated in EstimatedCoverage module (number of sequenced nucleotides / expected genome size) (default 15x). STOPs sample running.
    • Lower sample coverage, higher number of absent genes or higher number of genes with multiple alleles than specified in TrueCoverage module config file. STOPs sample running.
    • Fail FastQC “Per base sequence quality”, “Overrepresented sequences”, “Per sequence GC content” or “Sequence length distribution”. Do not pass FastQC “Per base N content” or “Adapter Content”. STOPs sample running if sample FastqQC fails after Trimmomatic reads cleaning.
    • Zero read pairs survive to Trimmomatic cleaning.
    • AssemblyMapping module does not run successfully.
    • Assembly coverage (calculated in AssemblyMapping module) of filtered contigs does not reach the minimum required (30x).
    • MLST scheme found does not match with provided species (mlst module) (with the exception of Yersinia genus, which might raise a warning).
  • WARNING
    • Fail FastQC “Per base sequence content”. Do not pass FastQC “Per base sequence quality” or “Overrepresented sequences”.
    • Higher number of contigs than allowed or odd number of assembled nucleotides.
    • Less than 95% of the reads mapped back to the assembly (in AssemblyMapping module).
    • mlst module did not run.
    • Found MLST scheme for a species with unknown scheme. In case of Yersinia genus, only raises a warning if the specific scheme found does not match with the scheme for provided species name (but matches the genus).

Make SPAdes QC assessment only depending of SPAdes run information (not relying on AssemblyMapping anymore).
Remove Trimmomatic and Pear information from QC assessment.
Update E. coli maximum number of multiple alleles in TrueCoverage module.
Include ReMatCh (https://github.com/B-UMMI/ReMatCh) as dependency for TrueCoverage module running.
Make INNUca compatible with new MLST version.
Add option –fastQCproceed to force INNUca to continue even if a sample fails FastQC.
Add option --maxNumberContigs to set the maximum number of contigs per 1.5 Mb of expected genome size (useful for species that intrinsically produce a more fragmented genome assembly).
Change --spadesUse_3_9 to –spadesVersion to specify a SPAdes version (default: 3.11.0).
Add option --noLog to tell INNUca to not create a log file (useful in Slurm environment since stdout and stderr are usually saved in a file).
Add option --noGitInfo to tell INNUca to not retreive GitHub repository information (useful when running INNUca in parallel independent jobs for many samples since it might save some time).
Change final general report (PASS, FAIL, WARNING samples).
Write failing and warning reports for each sample.
Change dependencies checking.
Fix minor errors.
Update README.
Add Dockerfile with Docker image recipe creation.

v3.0

11 Sep 14:05
Compare
Choose a tag to compare

v2.6

21 Jun 13:59
Compare
Choose a tag to compare
Use different MLST species scheme map versions
Write to stderr number samples FAIL

v2.5

05 Jun 16:21
Compare
Choose a tag to compare
v2.5

v2.4

29 May 15:16
Compare
Choose a tag to compare
Correct FastQC return when fails

v1.9

16 Nov 15:16
Compare
Choose a tag to compare
Correct saving return spades.qc_assembly