diff --git a/DESCRIPTION b/DESCRIPTION index c17314d..1b8344d 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,5 +1,5 @@ Package: tximeta -Version: 1.21.3 +Version: 1.21.4 Title: Transcript Quantification Import with Automatic Metadata Description: Transcript quantification import from Salmon and other quantifiers with automatic attachment of transcript ranges diff --git a/NEWS.md b/NEWS.md index 704fed3..acad930 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,3 +1,7 @@ +# tximeta 1.21.4 + +* Changing language in docs to "digest" instead of "checksum". + # tximeta 1.21.3 * GENCODE 44 (H.s.), M34 (M.m), and Ensembl 111 diff --git a/R/linkedTxome.R b/R/linkedTxome.R index d8031a9..036b031 100644 --- a/R/linkedTxome.R +++ b/R/linkedTxome.R @@ -1,6 +1,6 @@ #' Make and load linked transcriptomes ("linkedTxome") #' -#' \code{makeLinkedTxome} reads the checksum associated with a Salmon +#' \code{makeLinkedTxome} reads the digest associated with a Salmon #' index at \code{indexDir}, and links it to key information #' about the transcriptome, including the \code{source}, \code{organism}, #' \code{release}, and \code{genome} (these are custom character strings), @@ -58,7 +58,7 @@ #' on Zenodo. This enables consistent annotation and downstream annotation #' tasks, such as by \code{summarizeToGene}. #' @param write logical, should a JSON file be written out -#' which documents the transcriptome checksum and metadata? (default is TRUE) +#' which documents the transcriptome digest and metadata? (default is TRUE) #' @param jsonFile the path to the json file for the linkedTxome #' #' @return nothing, the function is run for its side effects diff --git a/R/tximeta.R b/R/tximeta.R index af4bc85..c280ea3 100644 --- a/R/tximeta.R +++ b/R/tximeta.R @@ -55,7 +55,7 @@ NULL #' Import transcript quantification with metadata #' -#' \code{tximeta} leverages the hashed checksum of the Salmon or piscem index, +#' \code{tximeta} leverages the hashed digest of the Salmon or piscem index, #' in addition to a number of core Bioconductor packages (GenomicFeatures, #' ensembldb, AnnotationHub, GenomeInfoDb, BiocFileCache) to automatically #' populate metadata for the user, without additional effort from the user. @@ -66,11 +66,11 @@ NULL #' \code{tximeta} can be used with any quantification \code{type} that is supported #' by \code{\link{tximport}}, where it will return an non-ranged SummarizedExperiment. #' -#' \code{tximeta} performs a lookup of the hashed checksum of the index +#' \code{tximeta} performs a lookup of the hashed digest of the index #' (stored in an auxilary information directory of the Salmon output) #' against a database of known transcriptomes, which lives within the tximeta #' package and is continually updated on Bioconductor's release schedule. -#' In addition, \code{tximeta} performs a lookup of the checksum against a +#' In addition, \code{tximeta} performs a lookup of the digest against a #' locally stored table of \code{linkedTxome}'s (see \code{link{makeLinkedTxome}}). #' If \code{tximeta} detects a match, it will automatically populate, #' e.g. the transcript locations, the transcriptome release, @@ -142,7 +142,7 @@ NULL #' @param ... arguments passed to \code{tximport} #' #' @return a SummarizedExperiment with metadata on the \code{rowRanges}. -#' (if the hashed checksum in the Salmon or Sailfish index does not match +#' (if the hashed digest in the Salmon or Sailfish index does not match #' any known transcriptomes, or any locally saved \code{linkedTxome}, #' \code{tximeta} will just return a non-ranged SummarizedExperiment) #' @@ -486,7 +486,7 @@ may lead to errors in object construction, unless 'dropInfReps=TRUE'") missingMetadata <- function(se, summarize=FALSE) { msg <- "use of this function requires transcriptome metadata which is missing. either: (1) the object was not produced by tximeta, or - (2) tximeta could not recognize the checksum of the transcriptome. + (2) tximeta could not recognize the digest of the transcriptome. If (2), use a linkedTxome to provide the missing metadata and rerun tximeta" if (summarize) { msg <- paste0(msg, " diff --git a/README.md b/README.md index 6999bdb..9a3d82c 100644 --- a/README.md +++ b/README.md @@ -26,8 +26,9 @@ This metadata is attached to the *SummarizedExperiment* in the `metadata()` and `rowRanges()` slots. For a list of the reference transcriptomes supported by `tximeta()`, -see the "Pre-computed checksums" section of the vignette in the -`Get started` tab. +see the "Pre-computed digests" section of the vignette in the +`Get started` tab. We call the computed identifier for the reference +transcriptome a "digest" or sometimes a "checksum". Further steps are also facilitated, e.g. `summarizeToGene()`, `addIds()`, or even `retrieveCDNA()` (the transcripts used for quantification) or diff --git a/man/figures/diagram.png b/man/figures/diagram.png index 632c96f..3024c6c 100644 Binary files a/man/figures/diagram.png and b/man/figures/diagram.png differ diff --git a/man/linkedTxome.Rd b/man/linkedTxome.Rd index 5cbabe4..b95a9e8 100644 --- a/man/linkedTxome.Rd +++ b/man/linkedTxome.Rd @@ -60,7 +60,7 @@ on Zenodo. This enables consistent annotation and downstream annotation tasks, such as by \code{summarizeToGene}.} \item{write}{logical, should a JSON file be written out -which documents the transcriptome checksum and metadata? (default is TRUE)} +which documents the transcriptome digest and metadata? (default is TRUE)} \item{jsonFile}{the path to the json file for the linkedTxome} } @@ -68,7 +68,7 @@ which documents the transcriptome checksum and metadata? (default is TRUE)} nothing, the function is run for its side effects } \description{ -\code{makeLinkedTxome} reads the checksum associated with a Salmon +\code{makeLinkedTxome} reads the digest associated with a Salmon index at \code{indexDir}, and links it to key information about the transcriptome, including the \code{source}, \code{organism}, \code{release}, and \code{genome} (these are custom character strings), diff --git a/man/tximeta.Rd b/man/tximeta.Rd index ee25aef..d05227a 100644 --- a/man/tximeta.Rd +++ b/man/tximeta.Rd @@ -75,12 +75,12 @@ reference transcripts with the \code{index_seq_hash} tag } \value{ a SummarizedExperiment with metadata on the \code{rowRanges}. -(if the hashed checksum in the Salmon or Sailfish index does not match +(if the hashed digest in the Salmon or Sailfish index does not match any known transcriptomes, or any locally saved \code{linkedTxome}, \code{tximeta} will just return a non-ranged SummarizedExperiment) } \description{ -\code{tximeta} leverages the hashed checksum of the Salmon or piscem index, +\code{tximeta} leverages the hashed digest of the Salmon or piscem index, in addition to a number of core Bioconductor packages (GenomicFeatures, ensembldb, AnnotationHub, GenomeInfoDb, BiocFileCache) to automatically populate metadata for the user, without additional effort from the user. @@ -92,11 +92,11 @@ when the quantification was performed with Salmon. However, \code{tximeta} can be used with any quantification \code{type} that is supported by \code{\link{tximport}}, where it will return an non-ranged SummarizedExperiment. -\code{tximeta} performs a lookup of the hashed checksum of the index +\code{tximeta} performs a lookup of the hashed digest of the index (stored in an auxilary information directory of the Salmon output) against a database of known transcriptomes, which lives within the tximeta package and is continually updated on Bioconductor's release schedule. -In addition, \code{tximeta} performs a lookup of the checksum against a +In addition, \code{tximeta} performs a lookup of the digest against a locally stored table of \code{linkedTxome}'s (see \code{link{makeLinkedTxome}}). If \code{tximeta} detects a match, it will automatically populate, e.g. the transcript locations, the transcriptome release, diff --git a/vignettes/images/diagram.png b/vignettes/images/diagram.png index 632c96f..3024c6c 100644 Binary files a/vignettes/images/diagram.png and b/vignettes/images/diagram.png differ diff --git a/vignettes/tximeta.Rmd b/vignettes/tximeta.Rmd index d979055..28f10d3 100644 --- a/vignettes/tximeta.Rmd +++ b/vignettes/tximeta.Rmd @@ -125,10 +125,16 @@ se <- tximeta(coldata) # What happened? -`tximeta` recognized the hashed checksum of the transcriptome that the files -were quantified against, it accessed the GTF file of the transcriptome -source, found and attached the transcript ranges, and added the -appropriate transcriptome and genome metadata. A remote GTF is only +`tximeta` recognized the computed *digest* of the transcriptome that +the files were quantified against, it accessed the GTF file of the +transcriptome source, found and attached the transcript ranges, and +added the appropriate transcriptome and genome metadata. +A *digest* is a small string of alphanumeric characters that uniquely +identifies the collection of sequences that were used for +quantification (it is the application of a hash function). We +sometimes also call this value a "checksum" (in the tximeta paper). + +A remote GTF is only downloaded once, and a local or remote GTF is only parsed to build a *TxDb* or *EnsDb* once: if `tximeta` recognizes that it has seen this *Salmon* index before, it will use a cached version of the metadata and @@ -158,13 +164,13 @@ downloading the GTF file. Again, the download/construction of a transcript database occurs only once, and upon subsequent usage of *tximeta* functions, the cached version will be used. -# Pre-computed checksums +# Pre-computed digests We plan to support a wide variety of sources and organisms for -transcriptomes with pre-computed checksums, though for now the +transcriptomes with pre-computed digests, though for now the software focuses on predominantly human and mouse transcriptomes -The following checksums are supported in this version of `tximeta`: +The following digests are supported in this version of `tximeta`: ```{r echo=FALSE} dir2 <- system.file("extdata", package="tximeta") @@ -492,11 +498,11 @@ e.g. if R gave an error when trying to connect to the TxDb associated with GENCODE v99 human transcripts, you should look for the `rid` of the entry associated with the human v99 GTF from GENCODE. -# What if checksum isn't known? +# What if digest isn't known? `tximeta` automatically imports relevant metadata when the transcriptome matches a known source -- *known* in the sense that it -is in the set of pre-computed hashed checksums in `tximeta` (GENCODE, +is in the set of pre-computed hashed digests in `tximeta` (GENCODE, Ensembl, and RefSeq for human and mouse). `tximeta` also facilitates the linking of transcriptomes used in building the *Salmon* index with relevant public sources, in the case that these are not part of this @@ -511,7 +517,7 @@ automatically recognized by `tximeta` and does not require making a out support for all common transcriptomes, from all sources. **Note:** if you are using Salmon in alignment mode, then there is no -Salmon index, and without the Salmon index, there is no checksum. We +Salmon index, and without the Salmon index, there is no digest. We don't have a perfect solution for this yet, but you can still summarize transcript counts to gene with a `tx2gene` table that you construct manually (see `tximport` vignette for example code). @@ -562,7 +568,7 @@ of these cases. To make this quantification reproducible, we make a `linkedTxome` which records key information about the sources of the transcript FASTA files, and the location of the relevant GTF file. It also -records the checksum of the transcriptome that was computed by +records the digest of the transcriptome that was computed by *Salmon* during the `index` step. **Source:** when creating the `linkedTxome` one must specify the @@ -595,7 +601,7 @@ the gene identifier to an underscore. See By default, `linkedTxome` will write out a JSON file which can be -shared with others, linking the checksum of the index with the other +shared with others, linking the digest of the index with the other metadata, including FASTA and GTF sources. By default, it will write out to a file with the same name as the `indexDir`, but with a `.json` extension added. This can be prevented with `write=FALSE`, and the @@ -669,7 +675,7 @@ makeLinkedTxome(indexDir=indexDir, ``` After running `makeLinkedTxome`, the connection between this *Salmon* -index (and its checksum) with the sources is saved for persistent +index (and its digest) with the sources is saved for persistent usage. Note that because we added a single transcript of 960bp to the FASTA file used for quantification, `tximeta` could tell that this was not quantified against release 98 of the Ensembl transcripts for