2.10.0
This version includes VIRUSBreakend: Viral Integration Recognition Using Single Breakends
VIRUSBreakend is a high-speed viral integration detection tool designed to be incorporated in the whole genome sequence piplines with minimal additional cost.
This version includes gridsstools: an optimised C implemention of the performance-critical steps used in VIRUSBreakend. A precompiled binary is included in the release package. If the precompiled binary does not run on your system, source code for building is available in src/main/c/gridsstools.
This version includes offical support for performing targeted GRIDSS calling. Use gridss_extract_overlapping_fragments.sh
on a BED or VCF file to make GRIDSS calls based on read/read pairs with an alignment overlapping the region of interest.
The following tools and entry points have been added in this version:
- virusbreakend.sh: driver script for VIRUSBreakend
- virusbreakend-build.sh: script for downloading and building VIRUSBreakend database
- gridss_extract_overlapping_fragments.sh: subsets a BAM based on regions of interest defined in a BED or VCF file
- Use this to extract reads of interest and metrics then run GRIDSS on the extracted bam.
- gridss_annotate_vcf_repeatmasker.sh: annotes single breakends and breakpoint inserted sequences with RepeatMasker annotations. Requires RepeatMasker to be installed.
- gridss_annotate_vcf_kraken2.sh: annotes single breakends and breakpoint inserted sequences with Kraken2 taxonomic identifiers. Requires kraken2 to be installed.
- gridsstools unmappedSequencesToFastq: Exports unmapped sequences to fastq. This tool is soft clip and split read-aware.
- gridsstools extractFragmentsToFastq: Extracts reads/read pairs from a list of read names to paired fastq files
- gridsstools extractFragmentsToBam: Subsets a BAM based on a list of read names
- This tool will be deprecated when
samtools view
has this capability. See samtools/samtools#1324 for progress
- This tool will be deprecated when
The follow entry points have been added to the GRIDSS jar:
- gridss.InsertedSequencesToFasta: exports single breakend and breakpoint inserted sequences to fasta
- gridss.ExtractFragmentsToFastq
- gridss.UnmappedSequencesToFastq
- gridss.repeatmasker.AnnotateVariantsRepeatMasker
- gridss.kraken.AnnotateVariantsKraken
- gridss.kraken.ExtractBestSequencesBasedOnReport
- gridss.kraken.SubsetToTaxonomy
- gridss.VirusBreakendFilter
This release also includes the following:
- Added scripts used to generate all figures in the GRIDSS2 preprint
- #349 Fixed poor assembly performance edge case
- #372 Default IO thread pool size now matches specified thread count
- #372 changed default memory usage to 30g since it's only DNA Nexus azure:mem2_ssd1 which won't like it
- #376 gridss_somatic_filter.R: added --configdir so path to gridss_config.R can be specified.
- #380 #393 gridss.sh: removed --repeatmaskerbed and replaced with gridss_annotate_vcf_repeatmasker.sh utility
- #385 don't write Q2 tag when using external aligner
- #386 Fixed assembly telemetry crash
- #389 Passing reference genome to metrics calculations
- #390 filtering any linkages to variants that have been hard filtered
- #392 recognising .tbi .csi .crai as index files when moving files around
- #396 catching OOM and immediately terminating to prevent hangs
- Passing through WORKER_THREADS to ComputeSamTags
- Precomputed are used if available
- Removed gridss.[Indexed]ExtractFullReads: removing entry points since they don't handle RP with supplementary alignments correctly.
- Replaced by
gridss_extract_overlapping_fragments.sh
- Replaced by