-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing my favorite organism/source [please add as a comment to this issue] #13
Comments
Wanted to try tximeta out. With Salmon 0.14.1 I prepared a salmon index from the gencode v29 (with decoys) currently up on main salmon site (using the --gencode flag). Quantified in mapping mode. When I tried to create a SummarizedExperiment though it was unable to recognize the transcriptome. Is something wrong or is gencode v29 not implemented? Thanks! > se <- tximeta(samples_tximeta)
importing quantifications
reading in files with read_tsv
1 2 3 4 5 6 7 8 9 10
tximeta needs a BiocFileCache directory to access and save TxDb objects.
Do you wish to use the default directory: 'C:\Users\msmit248\AppData\Local\BiocFileCache\BiocFileCache\Cache'?
If not, a temporary directory that is specific to this R session will be used.
You can always change this directory later by running: setTximetaBFC()
Or enter [0] to exit and set this directory manually now.
1: Yes (use default)
2: No (use temp)
Selection: 2
couldn't find matching transcriptome, returning un-ranged SummarizedExperiment
> se
class: SummarizedExperiment
dim: 205870 10
metadata(3): tximetaInfo quantInfo countsFromAbundance
assays(3): counts abundance length
rownames(205870): ENST00000456328.2 ENST00000450305.2 ... ENST00000387460.2 ENST00000387461.2
rowData names(0):
colnames(10): kw01_ifng kw01_veh ... p154_ifng p154_veh
colData names(6): run person ... batch names
> dim(se)
[1] 205870 10 |
I think we’ll need to get Gencode + decoy hash values from @rob-p. Correct Rob? We’ll work out a pipeline. |
Just more information: a short term fix would be to use But we really want these indices to automatically connect to the reference. @rob-p do you think we should pass the hash of the |
The Gencode + decoy hash was going to break plans on integrating with GA4GH to support all txomes (as the hash value on the server side wouldn't include the decoy sequence), and so the next version of Salmon will break out the |
This thread made me realize, the above workaround would be a useful technique to preserve the reference hash value when users want to add non-reference transcripts. For example, sometimes users will add ERCC spike-ins, viral sequences, or fusion genes. It may be useful to have a reference hash as well as a hash of non-reference sequences, and a total hash... |
ERCC would be great! |
Thanks for feedback @jtheorell So we don't have this working yet, but my thoughts were that we could have Salmon distinguish between the "primary" reference sequences of interest (e.g. transcripts), plus other perhaps "technical" sequences such as spike in or decoy sequences. Salmon will quantify against all these sequences, but for the purpose of txome identification, we'd like to know the hash of the primary seqs as well as the primary plus the technical seqs. This way we will at least be able to identify the provenance of the primary. Given that the technical seqs may be very idiosyncratic, it's not likely possible to identify primary + technical without the user creating a We don't have a formalized mechanism for this now, but it's a sketch of a solution. The current solution would be |
OK! Trying as good as I can to get it to work for now then. Thanks for your super rapid response! |
Please add any organism or source that we are missing that you'd like to be covered by tximeta, and we will consider the best way to fold it in. We want to cover as many use cases as possible, and support and encourage
linkedTxome
for remaining cases.The text was updated successfully, but these errors were encountered: