-
Notifications
You must be signed in to change notification settings - Fork 620
Transcript Assembly Visualization
View the merged GTF file from the 'de_novo' mode. Remember this merged GTF file combines both UHR and HBR (GTFs for each individually were also produced earlier).
cd $RNA_HOME/expression/stringtie/de_novo/
head stringtie_merged.gtf
For details on the format of these files, refer to the following links:
- https://ccb.jhu.edu/software/stringtie/gff.shtml#gffcompare
- http://cole-trapnell-lab.github.io/cufflinks/cuffmerge/index.html
- http://cole-trapnell-lab.github.io/cufflinks/cuffcompare/index.html#transfrag-class-codes
How many genes have at least one transcript assembled by StringTie in the 'de_novo' results?
cd $RNA_HOME/expression/stringtie/de_novo/
cat stringtie_merged.gtf | perl -ne 'if ($_ =~ /gene_id\s+\"(\S+)\"\;/){print "$1\n"}' | sort | uniq | wc -l
How many genes have at least one novel transcript assembled?
head merged.stringtie_merged.gtf.tmap
grep "j" merged.stringtie_merged.gtf.tmap
grep "j" merged.stringtie_merged.gtf.tmap | cut -f 1 | sort | uniq | wc -l
RegTools is a utility we created to help characterize individual exon splicing events and help to identify novel splice events and variants that have a direct influence on gene expression or splicing patterns. Refer to the regtools manual for more details.
We will use basic functionality of regtools to extract a junction.bed file for each of our BAMs that summarizes all distinct exon-exon splicing events represented in the RNA-seq data. We will also use regtools to annotate these junctions relative to our reference transcriptome GTF file:
cd $RNA_HOME/alignments/hisat2
regtools junctions extract HBR.bam > HBR.junctions.bed
head HBR.junctions.bed
regtools junctions annotate HBR.junctions.bed $RNA_REF_FASTA $RNA_REF_GTF > HBR.junctions.anno.bed
head HBR.junctions.anno.bed
regtools junctions extract UHR.bam > UHR.junctions.bed
head UHR.junctions.bed
regtools junctions annotate UHR.junctions.bed $RNA_REF_FASTA $RNA_REF_GTF > UHR.junctions.anno.bed
head UHR.junctions.anno.bed
Now pull out any junctions from either sample that appear to involve novel exon skipping, acceptor site usage, or donor site usage (relative to the reference transcriptome GTF). Require at three reads of support for each of the potentially novel junctions.
grep -P -w "NDA|A|D" HBR.junctions.anno.bed | perl -ne 'chomp; @l=split("\t",$_); if ($l[4] > 3){print "$_\n"}'
grep -P -w "NDA|A|D" UHR.junctions.anno.bed | perl -ne 'chomp; @l=split("\t",$_); if ($l[4] > 3){print "$_\n"}'
-
Before loading your BAM files, make turn on the 'Show junction track' option (View -> Preferences -> Alignments).
-
View the grand merged.gtf files that were generated by each of the StringTie modes: 'ref_guided', 'de_novo'.
-
Note: For the 'ref_only' mode, only the supplied transcript were considered. Therefore the gtf file from any individual stringtie (unmerged) will be the same and serve for comparison.
-
The following can be loaded directly in IGV by url
-
http://YOUR_IP_ADDRESS/rnaseq/expression/stringtie/ref_only/HBR_Rep1/transcripts.gtf
-
http://YOUR_IP_ADDRESS/rnaseq/expression/stringtie/ref_guided/stringtie_merged.gtf
-
http://YOUR_IP_ADDRESS/rnaseq/expression/stringtie/de_novo/stringtie_merged.gtf
Load the BAM files at the same time as the junctions.bed and merged.gtf files:
- The following can be loaded directly in IGV by url
- http://YOUR_IP_ADDRESS/rnaseq/alignments/hisat2/UHR.bam
- http://YOUR_IP_ADDRESS/rnaseq/alignments/hisat2/HBR.bam
Go to the following regions:
- chr22:44,292,789-44,341,778 (novel 5' exon)
- chr22:41,679,566-41,689,409 (alternative isoforms; create a Sashimi plot of this region)
- chr22:50,083,265-50,086,732 (alternative isoforms; create a Sashimi plot of this region)
- chr22:50,466,553-50,467,472 (novel cassette exon; create a Sashimi plot of this region)
- chr22:39,313,011-39,314,398 (skipping of a known exon; create a Sashimi plot of this region)
- chr22:46,362,928-46,364,315 (alternative acceptor sites; create a Sashimi plot of this region)
Do you see the evidence for any novel exons/transcript that are found in 'de_novo' or 'ref_guided' modes but NOT found in 'ref_only' mode? Explore in IGV for other examples of novel or different transcript predictions from the different cufflinks modes. Pay attention to how the predicted transcripts line up with known transcripts. Try loading the Ensembl transcripts track (File -> Load from Server).
NOTE: We have obviously just scratched the surface exploring these output files.
| Previous Section | This Section | Next Section | |:-----------------------------------------------:|:------------------------------------------------------------:|:-------------------------:| | Differential Splicing | Splicing Visualization | Trinity |
NOTICE: This resource has been moved to rnabio.org. The version here will be maintained for legacy use only. All future development and maintenance will occur only at rnabio.org. Please proceed to rnabio.org for the current version of this course.
Table of Contents
Module 0: Authors | Citation | Syntax | Intro to AWS | Log into AWS | Unix | Environment | Resources
Module 1: Installation | Reference Genomes | Annotations | Indexing | Data | Data QC
Module 2: Adapter Trim | Alignment | IGV | Alignment Visualization | Alignment QC
Module 3: Expression | Differential Expression | DE Visualization
Module 4: Alignment Free - Kallisto
Module 5: Ref Guided | De novo | Merging | Differential Splicing | Splicing Visualization
Module 6: Trinity
Module 7: Trinotate
Appendix: Saving Results | Abbreviations | Lectures | Practical Exercise Solutions | Integrated Assignment | Proposed Improvements | AWS Setup