Missing my favorite organism/source [please add as a comment to this issue] #13

mikelove · 2018-11-01T17:27:18Z

Please add any organism or source that we are missing that you'd like to be covered by tximeta, and we will consider the best way to fold it in. We want to cover as many use cases as possible, and support and encourage linkedTxome for remaining cases.

The text was updated successfully, but these errors were encountered:

matthewdavidsmith · 2019-07-11T21:15:42Z

Wanted to try tximeta out. With Salmon 0.14.1 I prepared a salmon index from the gencode v29 (with decoys) currently up on main salmon site (using the --gencode flag). Quantified in mapping mode. When I tried to create a SummarizedExperiment though it was unable to recognize the transcriptome. Is something wrong or is gencode v29 not implemented?

Thanks!

> se <- tximeta(samples_tximeta)
importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10 

tximeta needs a BiocFileCache directory to access and save TxDb objects.
Do you wish to use the default directory: 'C:\Users\msmit248\AppData\Local\BiocFileCache\BiocFileCache\Cache'?
If not, a temporary directory that is specific to this R session will be used.

You can always change this directory later by running: setTximetaBFC()
Or enter [0] to exit and set this directory manually now. 

1: Yes (use default)
2: No (use temp)

Selection: 2
couldn't find matching transcriptome, returning un-ranged SummarizedExperiment
> se
class: SummarizedExperiment 
dim: 205870 10 
metadata(3): tximetaInfo quantInfo countsFromAbundance
assays(3): counts abundance length
rownames(205870): ENST00000456328.2 ENST00000450305.2 ... ENST00000387460.2 ENST00000387461.2
rowData names(0):
colnames(10): kw01_ifng kw01_veh ... p154_ifng p154_veh
colData names(6): run person ... batch names
> dim(se)
[1] 205870     10

mikelove · 2019-07-11T21:24:33Z

I think we’ll need to get Gencode + decoy hash values from @rob-p. Correct Rob? We’ll work out a pipeline.

mikelove · 2019-07-12T13:37:10Z

Just more information: a short term fix would be to use linkedTxomes to connect the index to the source yourself.

But we really want these indices to automatically connect to the reference.

@rob-p do you think we should pass the hash of the -t transcripts alone to the JSON files, as a separate hash in addition to the transcripts plus the decoys? I'm not sure how the hashing is currently performed. Both hash values may be useful. Going forward, to connect to the GA4GH API we will need the hash value of the -t transcripts alone.

mikelove · 2019-07-17T12:41:12Z

The Gencode + decoy hash was going to break plans on integrating with GA4GH to support all txomes (as the hash value on the server side wouldn't include the decoy sequence), and so the next version of Salmon will break out the -t hash and the decoy hash separately, so tximeta will still work out of the box. In the meantime, you can explicitly link the txome to the GTF using makeLinkedTxome as shown in the vignette.

mikelove · 2019-07-20T12:12:48Z

This thread made me realize, the above workaround would be a useful technique to preserve the reference hash value when users want to add non-reference transcripts. For example, sometimes users will add ERCC spike-ins, viral sequences, or fusion genes. It may be useful to have a reference hash as well as a hash of non-reference sequences, and a total hash...

jtheorell · 2019-09-30T15:30:58Z

ERCC would be great!

mikelove · 2019-09-30T15:36:41Z

Thanks for feedback @jtheorell

So we don't have this working yet, but my thoughts were that we could have Salmon distinguish between the "primary" reference sequences of interest (e.g. transcripts), plus other perhaps "technical" sequences such as spike in or decoy sequences. Salmon will quantify against all these sequences, but for the purpose of txome identification, we'd like to know the hash of the primary seqs as well as the primary plus the technical seqs. This way we will at least be able to identify the provenance of the primary. Given that the technical seqs may be very idiosyncratic, it's not likely possible to identify primary + technical without the user creating a linkedTxome.

We don't have a formalized mechanism for this now, but it's a sketch of a solution. The current solution would be linkedTxome + Zenodo deposit for FASTA and GTF.

jtheorell · 2019-09-30T15:39:46Z

OK! Trying as good as I can to get it to work for now then. Thanks for your super rapid response!

mikelove changed the title ~~You're missing my favorite organism / source [Please add as a comment to this issue]~~ Missing my favorite organism/source [Pls add as a comment to this issue] Sep 30, 2019

mikelove added the enhancement label Feb 19, 2020

mikelove changed the title ~~Missing my favorite organism/source [Pls add as a comment to this issue]~~ Missing my favorite organism/source [please add as a comment to this issue] Apr 13, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Missing my favorite organism/source [please add as a comment to this issue] #13

Missing my favorite organism/source [please add as a comment to this issue] #13

mikelove commented Nov 1, 2018

matthewdavidsmith commented Jul 11, 2019 •

edited

Loading

mikelove commented Jul 11, 2019

mikelove commented Jul 12, 2019

mikelove commented Jul 17, 2019

mikelove commented Jul 20, 2019

jtheorell commented Sep 30, 2019

mikelove commented Sep 30, 2019 •

edited

Loading

jtheorell commented Sep 30, 2019

Missing my favorite organism/source [please add as a comment to this issue] #13

Missing my favorite organism/source [please add as a comment to this issue] #13

Comments

mikelove commented Nov 1, 2018

matthewdavidsmith commented Jul 11, 2019 • edited Loading

mikelove commented Jul 11, 2019

mikelove commented Jul 12, 2019

mikelove commented Jul 17, 2019

mikelove commented Jul 20, 2019

jtheorell commented Sep 30, 2019

mikelove commented Sep 30, 2019 • edited Loading

jtheorell commented Sep 30, 2019

matthewdavidsmith commented Jul 11, 2019 •

edited

Loading

mikelove commented Sep 30, 2019 •

edited

Loading