Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear output from ESPRESSO_Q summary.txt #78

Open
pclavell opened this issue Oct 22, 2024 · 3 comments
Open

Unclear output from ESPRESSO_Q summary.txt #78

pclavell opened this issue Oct 22, 2024 · 3 comments

Comments

@pclavell
Copy link

Hello,
I would like a clarification on what is exactly shown in each row in the summary.txt of ESPRESSO_Q:
For example, in this output:

number of FSM splice junction chains: 25922
number of ISM splice junction chains: 49296
number of NIC splice junction chains: 14954
number of validated NIC chains: 3894
number of NNC splice junction chains: 13508
number of validated NNC chains: 5656
number of splice junction chains with a failed junction: 37451
total FSM abundance: 7204779.9
total novel ISM abundance: 22660.36
total NIC abundance: 65848.54
total NNC abundance: 110658.77
total single exon abundance: 145342.99
number of detected FSM isoforms: 26322
number of detected novel ISM isoforms: 1417
number of detected NIC isoforms: 3728
number of detected NNC isoforms: 5413
number of detected single exon isoforms: 2271
number of internal exon boundary check failures: 85497
number of terminal exon boundary check failures: 2208080
  1. Difference between SJ chains and validated SJ chains?
  2. Difference between validated chain and detected isoform? Is it only about TSS and TTS for the same splice junction?
  3. How are the structural categories (FSM, ISM, NIC, NNC) obtained? Is it running SQANTI on the background and then only displaying these 4 categories?

Thanks a lot

@EricKutschera
Copy link
Contributor

Here's where the summary file is written: https://github.com/Xinglab/espresso/blob/v1.5.0/src/ESPRESSO_Q.pl#L985

The 'NIC splice junction chains' line is from 'num_nic_chains' which is set here: https://github.com/Xinglab/espresso/blob/v1.5.0/src/ESPRESSO_Q_Thread.pm#L826
For each read the sequence of splice junctions defines the SJ chain. The different chains are counted for each category (FSM, ISM, ...). Here's where the categories are determined for each read: https://github.com/Xinglab/espresso/blob/v1.5.0/src/ESPRESSO_Q_Thread.pm#L526
That code checks all the junctions in each read and compares them to the annotation

A novel chain is validated if it meets the --read_num_cutoff and --read_ratio_cutoff for each junction: https://github.com/Xinglab/espresso/blob/v1.5.0/src/ESPRESSO_Q.pl#L119
'validated NIC chains' is from 'num_validated_nic_chains' which is set here: https://github.com/Xinglab/espresso/blob/v1.5.0/src/ESPRESSO_Q_Thread.pm#L3165

'number of detected NIC isoforms' is from 'num_nic_isoforms_detected' which is set here: https://github.com/Xinglab/espresso/blob/v1.5.0/src/ESPRESSO_Q_Thread.pm#L2574
Some of the validated chains are a sub-chain of a longer validated chain. The detected isoforms are the validated chains which are not a sub-chain

@pclavell
Copy link
Author

Thanks for the quick reply:

  1. Then if I ran SQANTI I would get the exact same results, at least regarding transcripts belonging to already annotated genes?
  2. Considering this difference between validated and detected, how are ISM handled? If the same logic apply to ISM, only the longest container ISM would be kept, right?

@EricKutschera
Copy link
Contributor

I don't know the details of how SQANTI works. You may get different results

For the ISM SJ chains, if there is a read for an FSM chain that the ISM chain is a part of then the ISM chain won't result in a novel detected isoform. If that ISM chain doesn't have an FSM chain with a supporting read then the ISM chain could be one of the 'detected novel ISM isoforms'. The novel ISM chains, like the NIC and NNC chains, are only 'detected' if they are the longest chain (not a sub-chain)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants