-
Notifications
You must be signed in to change notification settings - Fork 5
VcfFilters
The raw output of Pindel requires further filtering to remove sequencing and mapping artifacts.
The following described the filtering methods included in this distribution. It should not be considered exhaustive or perfect.
There are several different types of sequencing data that can be processed through Pindel, each have a different purpose and so the filters change depending on the protocol employed. For example, a WXS experiment you are unlikely to be interested in non-exonic variants.
The categories are listed here with an abbreviation which will be used to annotate the descriptions below:
- Whole exome seq (WXS) - duplicates marked/removed (aka pulldown)
- Whole genome seq (WGS) - duplicates marked/removed
- Targeted pulldown (TG) - duplicates marked/removed
- Followup (FU) - High depth PCR based, duplicated NOT marked/removed
There are 2 types of rules applied during the 'flagging' step:
- Hard rules
- Applied as standard VCF filters and appear as FILTER entries in the header.
- Soft Rules
- Indicated below as 'SOFT'.
- These use the same format as hard rules but are incorporated into the INFO field so they can be excluded by preference. This provides some flexibility for work that may be more experimental.
The failure of some filters indicates a high likely-hood that a variant is actually germline. Filters of this type will be annotated with 'GERM'.
The filters have a numeric suffix. In some cases values are skipped as the filter has been retired. Some filter rules also exist in 2 forms, one with an 'F' prefix and one with an 'FF' prefix. The 'FF' version of the rules are applied when filtering using fragment based counting.
Terminology:
- Wildtype = Reads/depth from 'Normal' sample.
- Mutant = Reads/depth from 'Tumour' sample.
- Reads = Number of reads exhibiting the called event.
- Depth = Number of reads ref or otherwise across this region.
Pass when more reads are found by pindel in the Mutant sample than the Wildtype.
Failure indicates likely germline call.
WXS,TG,GERM
Pass when event is <= 4bp or >4bp and no calls in wildtype.
Failure indicates likely germline call.
WXS,RG,GERM
Pass when >= 3 Mutant reads in either direction or >=2 Mutant reads in both directions.
F003-F005 are actually a single filter broken down into easier to manage chunks_
WXS,TG
Pass when < 10 tumour depth reads.
Pass when >=10 and < 200 tumour depth reads and variant (on both strands >= 0.05 total depth) or (single strand evidence >=0.08 same strand depth).
WXS,WGS,TG
F003-F005 are actually a single filter broken down into easier to manage chunks
Pass when < 200 tumour depth reads.
Pass when >= 200 tumour depth reads and variant (on both strands >= 0.04 total depth) or (single strand evidence >=0.04 same strand depth).
WXS,WGS,TG
F003-F005 are actually a single filter broken down into easier to manage chunks
- Pass when event length is > 4 bp
- Pass when event <= 4 bp and INFO item 'REP' <= 9.
Here is a brief description of how the cgpPindel repeats metric works:
catcatcatcatcatcatCATCAT
cat cat cat cat cat cat (CAT) CAT
The deletion CATCAT can be broken down to a single repeating tuple CAT. Directly preceding the deletion is a 'cat' repeat which itself has 6 copies of the repeated tuple CAT in the deletion. So in this case the repeat count would be 6.
For this flag any variant of a length below 5 bases has this check performed over them. So:
cattccattccattccattccattccattccattccattccattccattccattcCATTC
with a repeat of 11 would not be filtered as the variant falls outside the minimum length.
WXS,WGS,TG
Pass when Mutant reads > 5 and Wildtype depth >= 0.08 * Mutant reads.
WXS,TG
Specifically for high depth sequencing types, checks sufficient normal coverage
Pass when Wiltype reads <= 0.05 * Tumour reads.
WXS,TG
Pass when in gene footprint (tabix bed input).
WXS
Pass when no overlapping records in Unmatched normal panel (tabix gff3 input). Also see FF021.
Only uses the start position of an event, when matching discards if any overlaps.
WXS,WGS
N/A - legacy
Pass when event >= 11 bp.
Pass when < 11 bp and <= 9 depth on both strands.
Pass when < 11 bp and > 9 depth on both strands and event is not seen in 0.2 of wiltype or mutant reads.
WGS,GERM
N/A - legacy
Pass when no reads from combined unique BWA+Pindel count of event from Wildtype.
WGS
Pass when Mutant has > 4 reads from pindel && > 0 reads from BWA mapping. Pass when Mutant has > 4 reads from pindel, REP=0 and pindel evidence on both strands.
WGS
Pass when variant doesn't overlap with a simple repeat.
SOFT (applied to all)
Pass when depth in original BWA mapping is >= 10 in both Mutant and Wildtype.
WGS
Pass when Tumour supporting fragments >= 3 or tumour fraction of supporting fragments >= 0.05
WXS
With WildType FD < 200, pass when one of the following is true:
- WildType FC <= 1 and WildType FD >= 10 and WildType FC <= 0.1 * Tumour FC
- WildType FC = 1 or 2 and WildType FC/FD <= 0.05 and Tumour FC/FD >= 0.2
With WildType FD >= 200, pass when WildType FC/FD<=0.02 and Tumour FC/FD>=0.20
WXS
Pass when no matching event in Unmatched normal panel (tabix bed input). Also see F010/FF010.
Uses the exact event range (RS/RE) and the event type (PC).
WXS,WGS