Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Transcripts support different numbers of reads #66

Open
liuxiaoning-wq opened this issue Jul 20, 2024 · 14 comments
Open

Transcripts support different numbers of reads #66

liuxiaoning-wq opened this issue Jul 20, 2024 · 14 comments

Comments

@liuxiaoning-wq
Copy link

Hi there,Do I need full-length transcripts to use ESPRESSO? For example, if the reads are not full-length, do they need to be filtered out?
Why do different positions of the same transcript support different read numbers? For example, my image has 20 at the beginning and 178 at the end, and another image has 87.93 at the beginning and 100 at the end.
b1fb1db278fc16f8d839ecfb498f748
b60ebe18629f3382f7764fe7cb74a1f
visualization_sirv

@EricKutschera
Copy link
Contributor

ESPRESSO expects that some of the reads will cover all the splice junctions in the transcript that the read is from and that other reads will only cover some of the junctions in the transcript. If a read has a sequence of splice junctions that could have come from multiple different full length transcripts then ESPRESSO can assign a partial count for that read to each matching transcript

Those different numbers are likely from different transcripts, not different positions of the same transcript. If you zoom in more you may see see more details about the transcripts

@liuxiaoning-wq
Copy link
Author

Thanks for your reply. I would like to ask if there is a corresponding relationship between the gene_ID N1 and N2 in the esp file and the values ​​in igv. If so, why are the values ​​different? For example: ENSG00000124713.6 N1 is 1175.3, but N1 in igv is 1461? Also, there are five transcripts in esp, but only two are shown in igv?
035ccd16a35cc3d67640503d0c5e15d
ce577733644106ec61a0e697639a694

@liuxiaoning-wq
Copy link
Author

And can the transcript ID be displayed in igv visualization?

@EricKutschera
Copy link
Contributor

It looks like N1 and N2 are your sample names and you loaded the N1.bed and N2.bed files output from visualize.py. The image shown in the README doesn't load those sample level bed files. Instead it uses the transcript level bed files output under target_genes/: https://github.com/Xinglab/espresso/tree/v1.4.0?tab=readme-ov-file#igv

@liuxiaoning-wq
Copy link
Author

Thank you very much. In this case, there are only four bed file for each sample of one ENST transcript in the target_genes of the visualization results, however, there are five transcripts in the esp file with four novel ESPRESSO transcripts. these four novel ESPRESSO transcripts are not show in target_genes file
1
2
. Why is that?

@EricKutschera
Copy link
Contributor

What was the command you ran? From https://github.com/Xinglab/espresso/tree/v1.4.0?tab=readme-ov-file#visualization-arguments

--target-gene TARGET_GENE
the name of the gene to visualize. transcripts with
name like {target-gene}-{number} or gene_id like
{target-gene}.* will have output generated. Use the
gene_id to match novel isoforms output by ESPRESSO

Based on that description it seems like --target-gene GNMT would only create the bed files for ENST00000372808.4 since it has transcript name GNMT-201. If you run with --target-gene ENSG00000124713 then I think it should create output for the novel transcripts

@liuxiaoning-wq
Copy link
Author

Thanks again. 1. Can we just look at the numbers to determine whether there is a new transcript in this sample? If the number is zero, it means that the transcript does not exist, right? 2. Can these numbers represent the expression levels of these transcripts? Can we use these numbers to do differential analysis?
1e26a88cf636fb70f787a88481e6eb9

@EricKutschera
Copy link
Contributor

Those numbers are from the abundance.esp file and they show the number of reads from that sample which ESPRESSO counted toward each isoform. If it's zero then ESPRESSO did not detect that transcript in that sample. Yes, you can use them for differential analysis (rMATS-long uses ESPRESSO output for differential analysis: https://github.com/Xinglab/rMATS-long)

@liuxiaoning-wq
Copy link
Author

thank you for your reply

@liuxiaoning-wq
Copy link
Author

Hello, these are three new transcripts of this gene. How can I obtain the sequences of these three new transcripts?
4bc1a8ddc38f721c82f1587cd361018

@EricKutschera
Copy link
Contributor

The coordinates for those transcripts should be in the updated.gtf file. See this post for a way to get the sequence from the gtf and fasta: #48

@liuxiaoning-wq
Copy link
Author

okay, thank you

@liuxiaoning-wq
Copy link
Author

Hello, can I use espresso to analyze fusion genes? If so, how do I do it and where can I see the results?

@EricKutschera
Copy link
Contributor

ESPRESSO doesn't specifically look for fusion genes and it might filter out alignments for fusion genes. There is a filter for alignments with large insertions (defaults to 20bp): https://github.com/Xinglab/espresso/blob/v1.5.0/src/ESPRESSO_S.pl#L924
Also ESPRESSO will only use 1 alignment per read even if there are supplementary alignments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants