Skip to content

Commit

Permalink
Final polish
Browse files Browse the repository at this point in the history
  • Loading branch information
Gammerdinger authored Jun 4, 2024
1 parent 6cdcbf9 commit 380f79b
Showing 1 changed file with 24 additions and 24 deletions.
48 changes: 24 additions & 24 deletions lessons/10_variant_prioritization.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ Let's break down the syntax a bit:


> #### Revisiting "( FILTER = 'PASS' )"
> Recall that we had used `SnpSift`, in the [filtering command that we used earlier](08_variant_filtering.md#filter-using-snpsift) in the workshop. In that lesson we used `SnpSift`to look at the `FILTER` field and reatin varaints with the value of `PASS`.
> Recall that we had used `SnpSift`, in the [filtering command that we used earlier](08_variant_filtering.md#filter-using-snpsift) in the workshop. In that lesson we used `SnpSift`to look at the `FILTER` field and retain varaints with the value of `PASS`.
>
> ```
> # YOU DO NOT NEED TO RUN THIS
Expand Down Expand Up @@ -114,7 +114,7 @@ java -jar $SNPEFF/SnpSift.jar filter \
To filter by the gene name you will need `"( ANN[*].GENE = 'INSERT_GENE_NAME' )"`.
> NOTE: When handling multiple valued fields (i.e. fields with commas), `SnpSift` uses a 0-based index to describing those elements. In the example below, we can see that the first four `ANN` fields are for the gene `CPTP` before we get to the annotations for `CPSF3L`. So instead if you have used `"( ANN[0].GENE = 'CPSF3L' )"` instead of `"( ANN[*].GENE = 'CPSF3L' )"`, then it will only return the entries which have `"CPSF3L"` in the first `GENE` annotation field and exclude the example below. The `*` tell `SnpSift` to extract the record if "any" annotations corresponds to `CPSF3L`. For most cases, you will want to use the `*`, but you should understand why it is there.
> NOTE: When handling multiple valued fields (i.e. fields with commas), `SnpSift` uses a 0-based index to describing those elements. In the example below, we can see that the first four `ANN` fields are for the gene `CPTP` before we get to the annotations for `CPSF3L`. So if you used `"( ANN[0].GENE = 'CPSF3L' )"` instead of `"( ANN[*].GENE = 'CPSF3L' )"`, then it will only return the entries which have `"CPSF3L"` in the first `GENE` annotation field and exclude the later entries. The `*` tells `SnpSift` to extract the record if "any" annotations corresponds to `CPSF3L`. For most cases, you will want to use the `*`, but you should understand why it is there.
> ```
> 1 1324300 . G A . PASS AS_FilterStatus=SITE;AS_SB_TABLE=47,6|11,0;ClippingRankSum=0.39;DP=68;ECNT=1;FS=2.373;GERMQ=93;MBQ=27,27;MFRL=337,338;MMQ=60,60;MPOS=27;MQ=60;MQ0=0;MQRankSum=0;NALOD=1.54;NLOD=9.88;POPAF=6;ReadPosRankSum=-0.125;TLOD=24.65;ANN=A|upstream_gene_variant|MODIFIER|CPTP|CPTP|transcript|NM_001029885.1|protein_coding||c.-2611G>A|||||463|,A|upstream_gene_variant|MODIFIER|CPTP|CPTP|transcript|XM_005244802.1|protein_coding||c.-3008G>A|||||456|,A|upstream_gene_variant|MODIFIER|CPTP|CPTP|transcript|XM_005244801.3|protein_coding||c.-2611G>A|||||843|,A|upstream_gene_variant|MODIFIER|CPTP|CPTP|transcript|XM_011542200.2|protein_coding||c.-2611G>A|||||1315|,A|intron_variant|MODIFIER|CPSF3L|CPSF3L|transcript|XM_011541647.1|protein_coding|1/18|c.28+281C>T||||||,A|intron_variant|MODIFIER|CPSF3L|CPSF3L|transcript|NM_001256456.1|protein_coding|1/18|c.-428+281C>T||||||,A|intron_variant|MODIFIER|CPSF3L|CPSF3L|transcript|NM_001256460.1|protein_coding|1/17|c.-167+281C>T||||||,A|intron_variant|MODIFIER|CPSF3L|CPSF3L|transcript|NM_001256462.1|protein_coding|1/14|c.28+281C>T||||||,A|intron_variant|MODIFIER|CPSF3L|CPSF3L|transcript|NM_001256463.1|protein_coding|1/14|c.28+281C>T||||||,A|intron_variant|MODIFIER|CPSF3L|CPSF3L|transcript|NM_017871.5|protein_coding|1/16|c.28+281C>T||||||,A|intron_variant|MODIFIER|CPSF3L|CPSF3L|transcript|XM_017001558.1|protein_coding|1/18|c.-438+281C>T||||||,A|intron_variant|MODIFIER|CPSF3L|CPSF3L|transcript|XM_017001557.1|protein_coding|1/17|c.-361+281C>T||||||,A|intron_variant|MODIFIER|CPSF3L|CPSF3L|transcript|XM_011541648.1|protein_coding|1/18|c.-91+281C>T||||||,A|intron_variant|MODIFIER|CPSF3L|CPSF3L|transcript|XM_011541650.1|protein_coding|1/16|c.-254+281C>T|||||| GT:AD:AF:DP:F1R2:F2R1:SB 0/0:33,0:0.028:33:12,0:20,0:32,1,0,0 0/1:20,11:0.367:31:6,6:12,5:15,5,11,0
> ```
Expand All @@ -129,6 +129,19 @@ java -jar $SNPEFF/SnpSift.jar filter \
### INFO Field: Effects
If you want to filter your output by the effects the variants have on the annotated gene models, the syntax for this is quite similar to the example for genes:
```
java -jar $SNPEFF/SnpSift.jar filter \
-noLog \
"( ANN[*].EFFECT has 'missense_variant' )" \
mutect2_syn3_normal_syn3_tumor_GRCh38.p7-pass-filt-LCR.pedigree_header.snpeff.dbSNP.vcf | less
```
To filter by a variant effect, the filter syntax is `"( ANN[*].EFFECT has 'VARIANT_EFFECT' )"`
> **NOTE**: Importantly, notice the use of `has` instead of `=` here. Sometimes effects field will contain mutliple effects such as `missense_variant&splice_donor_variant`. Using `ANN[*].EFFECT = missense_variant` here ***WILL NOT*** return this line, because the line is not equal to `missense_variant`, however `ANN[*].EFFECT has missense_variant` ***WILL*** return this line. Oftentimes for effects, one would be interested in the `has` query as opposed to the `=` one.
There are many different variant effects and some of the more common ones are listed below:
| SnpEff Annotation | Type of variant |
Expand All @@ -147,36 +160,23 @@ There are many different variant effects and some of the more common ones are li
Many more effects can be found [here](https://pcingola.github.io/SnpEff/se_inputoutput/#effect-prediction-details).
If you want to filter your output by the effects the variants have on the annotated gene models, the syntax for this is quite similar to the example for genes:
```
java -jar $SNPEFF/SnpSift.jar filter \
-noLog \
"( ANN[*].EFFECT has 'missense_variant' )" \
mutect2_syn3_normal_syn3_tumor_GRCh38.p7-pass-filt-LCR.pedigree_header.snpeff.dbSNP.vcf | less
```
To filter by a variant effect, the filter syntax is `"( ANN[*].EFFECT has 'VARIANT_EFFECT' )"`
> **NOTE**: Importantly, notice the use of `has` instead of `=` here. Sometimes effects field will contain mutliple effects such as `missense_variant&splice_donor_variant`. Using `ANN[*].EFFECT = missense_variant` here ***WILL NOT*** return this line, because the line is not equal to `missense_variant`, however `ANN[*].EFFECT has missense_variant` ***WILL*** return this line. Oftentimes for effects, one would be interested in the `has` query as opposed to the `=` one.
### INFO Field: Impacts
`SnpEff` also predicts the deleterious nature of a variant by binning it into one of several categories:
- `HIGH` These are variants that will **almost certainly have a deleterious impact on the transcript**. Examples of this would be the loss or gain of a stop codon or a frameshift mutation.
- `MODERATE` These are variants where the impact may have a deleterious impact on the transcript. Examples of this would be missense/non-synonymous variants and in-frame deletions/insertions.
- `LOW` These are variants that are **unlikely to have a deleterious impact on the transcript**. Examples of this would be silent/synonymous variants and alterations between different stop codons.
- `MODIFER` These variants are typically in non-coding regions and their impacts are difficult to assertain.
- `MODIFER` These variants are typically in non-coding regions and their impacts are difficult to ascertain.
More information on these categories can be found [here](https://pcingola.github.io/SnpEff/se_inputoutput/#impact-prediction) and a complete listing of the categories for each effect can be found [here](https://pcingola.github.io/SnpEff/se_inputoutput/#effect-prediction-details).
More information on these categories can be found [here](https://pcingola.github.io/SnpEff/snpeff/inputoutput/#impact-prediction) and a complete listing of the categories for each effect can be found [here](https://pcingola.github.io/SnpEff/snpeff/inputoutput/#effect-prediction-details).
Let's go ahead and select out all of our `HIGH` impact muations:
```
java -jar $SNPEFF/SnpSift.jar filter \
-noLog "( ANN[*].IMPACT has 'HIGH' )" \
-noLog \
"( ANN[*].IMPACT has 'HIGH' )" \
mutect2_syn3_normal_syn3_tumor_GRCh38.p7-pass-filt-LCR.pedigree_header.snpeff.dbSNP.vcf | less
```
Expand All @@ -186,19 +186,19 @@ Let's go ahead and redirect the output of these "high-impact" mutations to a new
```
java -jar $SNPEFF/SnpSift.jar filter \
-noLog "( ANN[*].IMPACT has 'HIGH' )" \
-noLog \
"( ANN[*].IMPACT has 'HIGH' )" \
mutect2_syn3_normal_syn3_tumor_GRCh38.p7-pass-filt-LCR.pedigree_header.snpeff.dbSNP.vcf > mutect2_syn3_normal_syn3_tumor_GRCh38.p7-pass-filt-LCR.pedigree_header.snpeff.dbSNP.high_impact.vcf
```
### INFO: Other ANN fields
In addition to `GENE`, `EFFECT` and `IMPACT`, there are a whole host of other `ANN` fields. Some of the other `ANN` fields that we will come across later are:
In addition to `GENE`, `EFFECT`, `IMPACT` AND `TRID`, there are a whole host of other `ANN` fields. Some of the other `ANN` fields that we will come across later are:
- `TRID` - Transcript ID or NCBI accesssion number
- `HGVS_P` - The alteration in protein notation
- `HGVS_C` - The alteration in DNA notation
A full list of `ANN` fields can be found [here](http://pcingola.github.io/SnpEff/ss_filter/#snpeff-ann-fields).
A full list of `ANN` fields can be found [here](https://pcingola.github.io/SnpEff/snpsift/filter/#snpeff-ann-fields).
## Snpsift: `vcfEffOnePerLine`
Expand Down Expand Up @@ -256,7 +256,7 @@ Let's breakdown this command:
- `-noLog` This does not report command usage to `SnpEff`'s server
- `"( ANN[*].EFFECT has 'missense_variant' )"` Filter out lines where `missense_variant` is annotation in ***ANY*** annotation.
- `mutect2_syn3_normal_syn3_tumor_GRCh38.p7-pass-filt-LCR.pedigree_header.snpeff.dbSNP.vcf |` This is our input file and then pipe this output.
- `$SNPEFF/scripts/vcfEffOnePerLine.pl` Place each effect on it's own line and pipe this output.
- `$SNPEFF/scripts/vcfEffOnePerLine.pl |` Place each effect on it's own line and pipe this output.
- `java -jar $SNPEFF/SnpSift.jar extractFields` This calls the `extractFields` package within `SnpSift`
- `-` The use of `-` is very commonly used to define the input as coming from standard input, or in other words, the input is being piped into the command.
- `"CHROM" "POS" "ANN[*].GENE" "ANN[*].TRID" "EFF[*].HGVS_P" "ANN[*].HGVS_C" "ANN[*].EFFECT"` This is defining the fields that we would like to filter.
Expand Down

0 comments on commit 380f79b

Please sign in to comment.