Skip to content

Commit

Permalink
changing docs to say digest
Browse files Browse the repository at this point in the history
  • Loading branch information
mikelove committed Feb 21, 2024
1 parent 7e5f57e commit b73d868
Show file tree
Hide file tree
Showing 10 changed files with 40 additions and 29 deletions.
2 changes: 1 addition & 1 deletion DESCRIPTION
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Package: tximeta
Version: 1.21.3
Version: 1.21.4
Title: Transcript Quantification Import with Automatic Metadata
Description: Transcript quantification import from Salmon and
other quantifiers with automatic attachment of transcript ranges
Expand Down
4 changes: 4 additions & 0 deletions NEWS.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,7 @@
# tximeta 1.21.4

* Changing language in docs to "digest" instead of "checksum".

# tximeta 1.21.3

* GENCODE 44 (H.s.), M34 (M.m), and Ensembl 111
Expand Down
4 changes: 2 additions & 2 deletions R/linkedTxome.R
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
#' Make and load linked transcriptomes ("linkedTxome")
#'
#' \code{makeLinkedTxome} reads the checksum associated with a Salmon
#' \code{makeLinkedTxome} reads the digest associated with a Salmon
#' index at \code{indexDir}, and links it to key information
#' about the transcriptome, including the \code{source}, \code{organism},
#' \code{release}, and \code{genome} (these are custom character strings),
Expand Down Expand Up @@ -58,7 +58,7 @@
#' on Zenodo. This enables consistent annotation and downstream annotation
#' tasks, such as by \code{summarizeToGene}.
#' @param write logical, should a JSON file be written out
#' which documents the transcriptome checksum and metadata? (default is TRUE)
#' which documents the transcriptome digest and metadata? (default is TRUE)
#' @param jsonFile the path to the json file for the linkedTxome
#'
#' @return nothing, the function is run for its side effects
Expand Down
10 changes: 5 additions & 5 deletions R/tximeta.R
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@ NULL

#' Import transcript quantification with metadata
#'
#' \code{tximeta} leverages the hashed checksum of the Salmon or piscem index,
#' \code{tximeta} leverages the hashed digest of the Salmon or piscem index,
#' in addition to a number of core Bioconductor packages (GenomicFeatures,
#' ensembldb, AnnotationHub, GenomeInfoDb, BiocFileCache) to automatically
#' populate metadata for the user, without additional effort from the user.
Expand All @@ -66,11 +66,11 @@ NULL
#' \code{tximeta} can be used with any quantification \code{type} that is supported
#' by \code{\link{tximport}}, where it will return an non-ranged SummarizedExperiment.
#'
#' \code{tximeta} performs a lookup of the hashed checksum of the index
#' \code{tximeta} performs a lookup of the hashed digest of the index
#' (stored in an auxilary information directory of the Salmon output)
#' against a database of known transcriptomes, which lives within the tximeta
#' package and is continually updated on Bioconductor's release schedule.
#' In addition, \code{tximeta} performs a lookup of the checksum against a
#' In addition, \code{tximeta} performs a lookup of the digest against a
#' locally stored table of \code{linkedTxome}'s (see \code{link{makeLinkedTxome}}).
#' If \code{tximeta} detects a match, it will automatically populate,
#' e.g. the transcript locations, the transcriptome release,
Expand Down Expand Up @@ -142,7 +142,7 @@ NULL
#' @param ... arguments passed to \code{tximport}
#'
#' @return a SummarizedExperiment with metadata on the \code{rowRanges}.
#' (if the hashed checksum in the Salmon or Sailfish index does not match
#' (if the hashed digest in the Salmon or Sailfish index does not match
#' any known transcriptomes, or any locally saved \code{linkedTxome},
#' \code{tximeta} will just return a non-ranged SummarizedExperiment)
#'
Expand Down Expand Up @@ -486,7 +486,7 @@ may lead to errors in object construction, unless 'dropInfReps=TRUE'")
missingMetadata <- function(se, summarize=FALSE) {
msg <- "use of this function requires transcriptome metadata which is missing.
either: (1) the object was not produced by tximeta, or
(2) tximeta could not recognize the checksum of the transcriptome.
(2) tximeta could not recognize the digest of the transcriptome.
If (2), use a linkedTxome to provide the missing metadata and rerun tximeta"
if (summarize) {
msg <- paste0(msg, "
Expand Down
5 changes: 3 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,8 +26,9 @@ This metadata is attached to the *SummarizedExperiment* in the
`metadata()` and `rowRanges()` slots.

For a list of the reference transcriptomes supported by `tximeta()`,
see the "Pre-computed checksums" section of the vignette in the
`Get started` tab.
see the "Pre-computed digests" section of the vignette in the
`Get started` tab. We call the computed identifier for the reference
transcriptome a "digest" or sometimes a "checksum".

Further steps are also facilitated, e.g. `summarizeToGene()`, `addIds()`,
or even `retrieveCDNA()` (the transcripts used for quantification) or
Expand Down
Binary file modified man/figures/diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
4 changes: 2 additions & 2 deletions man/linkedTxome.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

8 changes: 4 additions & 4 deletions man/tximeta.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Binary file modified vignettes/images/diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
32 changes: 19 additions & 13 deletions vignettes/tximeta.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -125,10 +125,16 @@ se <- tximeta(coldata)

# What happened?

`tximeta` recognized the hashed checksum of the transcriptome that the files
were quantified against, it accessed the GTF file of the transcriptome
source, found and attached the transcript ranges, and added the
appropriate transcriptome and genome metadata. A remote GTF is only
`tximeta` recognized the computed *digest* of the transcriptome that
the files were quantified against, it accessed the GTF file of the
transcriptome source, found and attached the transcript ranges, and
added the appropriate transcriptome and genome metadata.
A *digest* is a small string of alphanumeric characters that uniquely
identifies the collection of sequences that were used for
quantification (it is the application of a hash function). We
sometimes also call this value a "checksum" (in the tximeta paper).

A remote GTF is only
downloaded once, and a local or remote GTF is only parsed to build a
*TxDb* or *EnsDb* once: if `tximeta` recognizes that it has seen this *Salmon*
index before, it will use a cached version of the metadata and
Expand Down Expand Up @@ -158,13 +164,13 @@ downloading the GTF file. Again, the download/construction of a
transcript database occurs only once, and upon subsequent usage of
*tximeta* functions, the cached version will be used.

# Pre-computed checksums
# Pre-computed digests

We plan to support a wide variety of sources and organisms for
transcriptomes with pre-computed checksums, though for now the
transcriptomes with pre-computed digests, though for now the
software focuses on predominantly human and mouse transcriptomes

The following checksums are supported in this version of `tximeta`:
The following digests are supported in this version of `tximeta`:

```{r echo=FALSE}
dir2 <- system.file("extdata", package="tximeta")
Expand Down Expand Up @@ -492,11 +498,11 @@ e.g. if R gave an error when trying to connect to the TxDb associated
with GENCODE v99 human transcripts, you should look for the `rid` of
the entry associated with the human v99 GTF from GENCODE.

# What if checksum isn't known?
# What if digest isn't known?

`tximeta` automatically imports relevant metadata when the
transcriptome matches a known source -- *known* in the sense that it
is in the set of pre-computed hashed checksums in `tximeta` (GENCODE,
is in the set of pre-computed hashed digests in `tximeta` (GENCODE,
Ensembl, and RefSeq for human and mouse). `tximeta` also facilitates the
linking of transcriptomes used in building the *Salmon* index with
relevant public sources, in the case that these are not part of this
Expand All @@ -511,7 +517,7 @@ automatically recognized by `tximeta` and does not require making a
out support for all common transcriptomes, from all sources.

**Note:** if you are using Salmon in alignment mode, then there is no
Salmon index, and without the Salmon index, there is no checksum. We
Salmon index, and without the Salmon index, there is no digest. We
don't have a perfect solution for this yet, but you can still
summarize transcript counts to gene with a `tx2gene` table that you
construct manually (see `tximport` vignette for example code).
Expand Down Expand Up @@ -562,7 +568,7 @@ of these cases.
To make this quantification reproducible, we make a `linkedTxome`
which records key information about the sources of the transcript
FASTA files, and the location of the relevant GTF file. It also
records the checksum of the transcriptome that was computed by
records the digest of the transcriptome that was computed by
*Salmon* during the `index` step.

**Source:** when creating the `linkedTxome` one must specify the
Expand Down Expand Up @@ -595,7 +601,7 @@ the gene identifier to an underscore. See


By default, `linkedTxome` will write out a JSON file which can be
shared with others, linking the checksum of the index with the other
shared with others, linking the digest of the index with the other
metadata, including FASTA and GTF sources. By default, it will write
out to a file with the same name as the `indexDir`, but with a `.json`
extension added. This can be prevented with `write=FALSE`, and the
Expand Down Expand Up @@ -669,7 +675,7 @@ makeLinkedTxome(indexDir=indexDir,
```

After running `makeLinkedTxome`, the connection between this *Salmon*
index (and its checksum) with the sources is saved for persistent
index (and its digest) with the sources is saved for persistent
usage. Note that because we added a single transcript of 960bp to the
FASTA file used for quantification, `tximeta` could tell that this was
not quantified against release 98 of the Ensembl transcripts for
Expand Down

0 comments on commit b73d868

Please sign in to comment.