Skip to content

2.10.0

Compare
Choose a tag to compare
@d-cameron d-cameron released this 14 Oct 06:51
· 322 commits to master since this release

This version includes VIRUSBreakend: Viral Integration Recognition Using Single Breakends

VIRUSBreakend is a high-speed viral integration detection tool designed to be incorporated in the whole genome sequence piplines with minimal additional cost.

This version includes gridsstools: an optimised C implemention of the performance-critical steps used in VIRUSBreakend. A precompiled binary is included in the release package. If the precompiled binary does not run on your system, source code for building is available in src/main/c/gridsstools.

This version includes offical support for performing targeted GRIDSS calling. Use gridss_extract_overlapping_fragments.sh on a BED or VCF file to make GRIDSS calls based on read/read pairs with an alignment overlapping the region of interest.

The following tools and entry points have been added in this version:

  • virusbreakend.sh: driver script for VIRUSBreakend
  • virusbreakend-build.sh: script for downloading and building VIRUSBreakend database
  • gridss_extract_overlapping_fragments.sh: subsets a BAM based on regions of interest defined in a BED or VCF file
    • Use this to extract reads of interest and metrics then run GRIDSS on the extracted bam.
  • gridss_annotate_vcf_repeatmasker.sh: annotes single breakends and breakpoint inserted sequences with RepeatMasker annotations. Requires RepeatMasker to be installed.
  • gridss_annotate_vcf_kraken2.sh: annotes single breakends and breakpoint inserted sequences with Kraken2 taxonomic identifiers. Requires kraken2 to be installed.
  • gridsstools unmappedSequencesToFastq: Exports unmapped sequences to fastq. This tool is soft clip and split read-aware.
  • gridsstools extractFragmentsToFastq: Extracts reads/read pairs from a list of read names to paired fastq files
  • gridsstools extractFragmentsToBam: Subsets a BAM based on a list of read names
    • This tool will be deprecated when samtools view has this capability. See samtools/samtools#1324 for progress

The follow entry points have been added to the GRIDSS jar:

  • gridss.InsertedSequencesToFasta: exports single breakend and breakpoint inserted sequences to fasta
  • gridss.ExtractFragmentsToFastq
  • gridss.UnmappedSequencesToFastq
  • gridss.repeatmasker.AnnotateVariantsRepeatMasker
  • gridss.kraken.AnnotateVariantsKraken
  • gridss.kraken.ExtractBestSequencesBasedOnReport
  • gridss.kraken.SubsetToTaxonomy
  • gridss.VirusBreakendFilter

This release also includes the following:

  • Added scripts used to generate all figures in the GRIDSS2 preprint
  • #349 Fixed poor assembly performance edge case
  • #372 Default IO thread pool size now matches specified thread count
  • #372 changed default memory usage to 30g since it's only DNA Nexus azure:mem2_ssd1 which won't like it
  • #376 gridss_somatic_filter.R: added --configdir so path to gridss_config.R can be specified.
  • #380 #393 gridss.sh: removed --repeatmaskerbed and replaced with gridss_annotate_vcf_repeatmasker.sh utility
  • #385 don't write Q2 tag when using external aligner
  • #386 Fixed assembly telemetry crash
  • #389 Passing reference genome to metrics calculations
  • #390 filtering any linkages to variants that have been hard filtered
  • #392 recognising .tbi .csi .crai as index files when moving files around
  • #396 catching OOM and immediately terminating to prevent hangs
  • Passing through WORKER_THREADS to ComputeSamTags
  • Precomputed are used if available
  • Removed gridss.[Indexed]ExtractFullReads: removing entry points since they don't handle RP with supplementary alignments correctly.
    • Replaced by gridss_extract_overlapping_fragments.sh