Releases: PapenfussLab/gridss
2.10.0
This version includes VIRUSBreakend: Viral Integration Recognition Using Single Breakends
VIRUSBreakend is a high-speed viral integration detection tool designed to be incorporated in the whole genome sequence piplines with minimal additional cost.
This version includes gridsstools: an optimised C implemention of the performance-critical steps used in VIRUSBreakend. A precompiled binary is included in the release package. If the precompiled binary does not run on your system, source code for building is available in src/main/c/gridsstools.
This version includes offical support for performing targeted GRIDSS calling. Use gridss_extract_overlapping_fragments.sh
on a BED or VCF file to make GRIDSS calls based on read/read pairs with an alignment overlapping the region of interest.
The following tools and entry points have been added in this version:
- virusbreakend.sh: driver script for VIRUSBreakend
- virusbreakend-build.sh: script for downloading and building VIRUSBreakend database
- gridss_extract_overlapping_fragments.sh: subsets a BAM based on regions of interest defined in a BED or VCF file
- Use this to extract reads of interest and metrics then run GRIDSS on the extracted bam.
- gridss_annotate_vcf_repeatmasker.sh: annotes single breakends and breakpoint inserted sequences with RepeatMasker annotations. Requires RepeatMasker to be installed.
- gridss_annotate_vcf_kraken2.sh: annotes single breakends and breakpoint inserted sequences with Kraken2 taxonomic identifiers. Requires kraken2 to be installed.
- gridsstools unmappedSequencesToFastq: Exports unmapped sequences to fastq. This tool is soft clip and split read-aware.
- gridsstools extractFragmentsToFastq: Extracts reads/read pairs from a list of read names to paired fastq files
- gridsstools extractFragmentsToBam: Subsets a BAM based on a list of read names
- This tool will be deprecated when
samtools view
has this capability. See samtools/samtools#1324 for progress
- This tool will be deprecated when
The follow entry points have been added to the GRIDSS jar:
- gridss.InsertedSequencesToFasta: exports single breakend and breakpoint inserted sequences to fasta
- gridss.ExtractFragmentsToFastq
- gridss.UnmappedSequencesToFastq
- gridss.repeatmasker.AnnotateVariantsRepeatMasker
- gridss.kraken.AnnotateVariantsKraken
- gridss.kraken.ExtractBestSequencesBasedOnReport
- gridss.kraken.SubsetToTaxonomy
- gridss.VirusBreakendFilter
This release also includes the following:
- Added scripts used to generate all figures in the GRIDSS2 preprint
- #349 Fixed poor assembly performance edge case
- #372 Default IO thread pool size now matches specified thread count
- #372 changed default memory usage to 30g since it's only DNA Nexus azure:mem2_ssd1 which won't like it
- #376 gridss_somatic_filter.R: added --configdir so path to gridss_config.R can be specified.
- #380 #393 gridss.sh: removed --repeatmaskerbed and replaced with gridss_annotate_vcf_repeatmasker.sh utility
- #385 don't write Q2 tag when using external aligner
- #386 Fixed assembly telemetry crash
- #389 Passing reference genome to metrics calculations
- #390 filtering any linkages to variants that have been hard filtered
- #392 recognising .tbi .csi .crai as index files when moving files around
- #396 catching OOM and immediately terminating to prevent hangs
- Passing through WORKER_THREADS to ComputeSamTags
- Precomputed are used if available
- Removed gridss.[Indexed]ExtractFullReads: removing entry points since they don't handle RP with supplementary alignments correctly.
- Replaced by
gridss_extract_overlapping_fragments.sh
- Replaced by
2.9.4
- #368 reduced GKL loading failure to warning message
- #348 Fixed NPE in GeneratePonBedpe
- #356 realignment records not longer being unclipped twice
- #349 Fixed performance issue where an empty blacklist was not cached
- #349 Fixed poor assembly performance edge case
- #366 Updated dependencies to latest versions
- CRAM input files should now be fully support
- Removed SoftClipsToSplitReads.REALIGN_ANCHORING_BASES parameter
- No longer used since this approach had more edge cases than realigning the entire assembly contig
- Cleaning up namedsorted bam
- Extra unnecessary bam file no longer left in .gridss.working directory
- Extended minimum realignment length to 20bp
- improves libbwa stability
- Added AnnotateInsertedSequence.MIN_SEQUENCE_LENGTH
- improves libbwa stability
- Fixed potential intermediate file corruption if the
gridss.SoftClipToSplitReads
process was killed during the preprocessing step - Upgraded defensive GC log message to INFO
- Added single breakend assembly support bias filter
- Not reporting variants entirely contained in assembly anchor
- Fixed "Record should have been dropped" in SoftClipToSplitReads
- Repository now includes all R scripts used to generate the GRIDSS2 paper
2.9.3
- #348 Fixed NPE in GeneratePonBedpe
- Cleaning up named sorted temporary bam file when no longer required
- Added
ASSEMBLY_BIAS
single breakend assembly support bias filter- This is a more generalised version of the
ASSEMBLY_ONLY
/NO_ASSEMBLY
filters
- This is a more generalised version of the
- Added
NO_SR
andNO_RP
filters to reduce single breakend FDR - Fixed "Record should have been dropped" in SoftClipToSplitReads when using external alignment
- Only writing a single realignment record for anchoring bases
- Fixes edge case where unphased variants are sometimes phased cis
- Removed
SoftClipsToSplitReads.REALIGN_ANCHORING_BASES
parameter- This split breakend/anchoring sequence alignment approach ends up worse than realigning the entire read. If the initial assembly was over-aligned, it will remain so. Worse, it will result in a soft clip in the anchoring bases thus inserted sequence which should be aligned to the other side.
- The is a reversion to pre-2.9.0 GRIDSS behavour
- Reduced lock contention when performing multi-threaded BAM reading
- Not attempting realignment for sequences shorter than 20bp
- Fixes issues with in-process bwa instablility when aligning very short sequences
- Added
AnnotateInsertedSequence.MIN_SEQUENCE_LENGTH
parameter with default of 20 SoftClipsToSplitReads.MIN_CLIP_LENTH
now defaults to 20
- Added ability to dump the sequences sent for in-process realignment to a fastq file
2.9.2
- #333 Fixed tumourordinal crash in gridss_somatic_filter.R
- Reduced RepeatMaskerBEDFeature memory usage
- Fixes Out of Memory exception in
gridss.AnnotateInsertedSequence
when a RepeatMasker BED file is specified
- Fixes Out of Memory exception in
AnnotateInsertedSequence
defaulting to in-process alignment- External process streaming aligner output buffer size is now bounded
- Fixes Out of Memory exception in
gridss.AnnotateInsertedSequence
- Fixes Out of Memory exception in
- #344 Fixed IHOMLEN bug where -ve breakends had revcomp insert sequences when comparing
- Fixes inconsistent IHOMLEN when inserted sequence is present
- #343 Fixed race condition in SinglePassSamProgram
- #342 fixed crash when ref genome masking for assembly debug export
- Reduced logging level of "found path with no support" assembly message
- #340 Added packaging script to automate github release file set
- Added version sanity check on Dockerfile
2.9.1
- Reimplemented
gridss_annotate_insertions_repeatmaster.R
intogridss.InsertedSequenceAnnotator
- Added
--repeatmaskerbed
command line option togridss.sh
to do RepeatMasker annotating of inserted sequences gridss_annotate_insertions_repeatmaster.R
is no longer included in GRIDSS releases
- Added
- Fixed potential memory leak when using in-process bwa alignment
- Improved performance of steps using in-process bwa alignment
- Improved performance of variant calling steps
- Limited some spammy log messages
- Improved assembly stability
- Fixes issues some users have encountered when processing hg38 alt contigs when using bwa-aligned input files
2.9.0 pre-release
This release includes significant changes to how GRIDSS performs preprocessing and alignment. GRIDSS now uses in-process bwa alignment instead of requiring a command-line bwa. This requires an additional .img
file which is generated from the bwa index. A new setupreference
step has been added to GRIDSS driver script so all files related to the reference genome can be generated as a once-off operation.
- Added setupreference step to GRIDSS driver script
- One-off initialisation and files written to the reference genome directory are now explicitly a separate step
- Added BWA JNI interface
- External alignment no longer required
- Added
PreprocessForBreakendAssembly
command line program- Combines
ComputeSamTags
andSoftClipsToSplitReads
in a single pass over each input.sv.bam file. - Approximately 50% speedup in preprocessing time due to better parallelisation
- Combines
- Added SoftClipsToSplitReads REALIGN_UNANCHORED_BASES option
- Using REALIGN_UNANCHORED_BASES instead of REALIGN_ENTIRE_READ for assembly realignment
- Fixes an issue with GRIDSS2 having slightly sensitivity than GRIDSS1 for deletions in which the ref has a tandem duplication (e.g SINE-SINE becomes SINE)
- Fixed bug causing the nominal position of the two sides of a breakpoint with homology to not match for both BND records
- Better error message if aligner process is killed
- Added max/max/mean mapq INFO fields
- #319 Writing out all reproduction data for all assembly errors to prevent early abort truncating the file write
- #329 Fixed crash in gridss_annotate_insertions_repeatmaster.R when processing chromomsomes containing ":" (HLA types)
- #312 now supporting arbitrary split read alignment overlaps (fixes java.lang.OutOfMemoryError error)
- Standardised error codes to match sysexits.h
- More meaningful exit codes from gridss.sh
- #334 cleaned up driver script logging
- Full log file now include all log messages
- #323 added --nojni command line option to disable native acceleration
- Updated htsjdk/picard versions
- Fixed error where split read records to be dropped were not actually dropped.
2.8.3
2.8.2
Fixed assembly errors and inconsistencies in evidence handling.
- #311 Fixed ComputeSamTags split read processing error when the split reads overlap.
- #307 fixed --useproperpair parameter
- #298 Fixed issue with RP reads not always being jointly tracked during assembly
- #278 supplementary alignments no longer provide read pair evidence
- #278 SAMRecord dovetail filter moved to soft clip evidence to prevent orphaning of split read evidence
- Not counting RP anchor KmerEvidence interval as it's encoded in the non-anchoring KmerEvidence
- Improved assembly logging
- Improved debugging and error reporting
- Added --keepTempFiles debugging option
- Added --sanityCheck parameter for identifying inconsistent evidence
2.8.1
- Better assembly handling of libraries with fragment size shorter than read length
- #286 Explicitly logging assembly error stack trace to assist in debugging potential assembly errors.
- #306 replacing :| in BEALN reference names with _ to prevent downstream parsing errors.
- #301 Using QualityScoreDistribution instead of CollectAlignmentSummaryMetrics as placeholder picard metric gathering assembly metrics
- Driver script improvements
- #310 fixed driver script crash on single-ended sequencing data
- Improved error message with input argument is missing.
- Fixed error on multisample VCF with --plotdir specified
- #307 Added --useproperpair and --concordantreadpairdistribution driver script command line arguments
- Driver script no longer defaults to extracting RP based on SAM flag
- Fixes issue in which either too many or too few RP were extracted
2.8.0
- Reverted to
MATEID
instead ofPARID
for the VCF breakpoint record pairing.MATEID
is the correct field to use according to the VCF specifications.
- Added
FIX_SA
andFIX_MISSING_HARD_CLIP
togridss.ComputeSamTags
FIX_SA
: rewrites split readSA
tags- corrects GATK indel realignment
SA
tag data inconsistency
- corrects GATK indel realignment
FIX_MISSING_HARD_CLIP
: infers missing hard clipping if split read records have different read lengths- corrects for GATK indel realignment stripping hard clipping when realigning
- GRIDSS log files should no longer be full of
SA tag of read ********** refers to missing alignments
warning messages! - There should be significantly fewer data inconsistencies when running on GATK indel realigned bams.
- #291 Updated libraries to htsjdk 2.21.1 and picard 2.21.8
- Improved CRAM support
- #278 the nominal position breakpoint position at both breakend records is guaranteed to be the same
- #293
gridss.GeneratePonBedpe
now defaults to treating the first sample as the normal - #283 Validating steps command line argument. Fixed bug with "all"/"call" step parsing
gridss_somatic_filter.R
now writes VCF header for all filters- #295 Added error message if using a very old samtools version
- #296
gridss_annotate_insertions_repeatmasker.R
now explicitly sets repeatmasker column types- Fixes crash reading a repeatmasker .fa.out file when using integer chromosome numbers without a chr prefix.
- #292
gridss.SoftClipToSplitReads
now soft clips alignments that align over the start or end of a chromosome- Fixes occasional crash during assembly realignment with older bwa versions
- #287 assembly contig per-base support treats RP anchoring with no valid kmers treated as the anchoring read was ignored
- Fixes crash when one of two reads in a read pair is shorter than 25bp