-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
No explanation of the y-axis scale bar when visualizing a quantitative track from BigWig files #4748
Comments
there are a couple different factors at play
1. General concept of y-axis labelsThis could be something we allow users to configure or something, similar to ylab('...') in R, etc. However, automatically generating a y axis label is sort of complex due to (2) and (3) 2. General concept of what bigwig files representBigWig files represent whatever quantitative signal a user originally created and converted into a bigwig format. For example...you can convert the below bedgraph into a bigwig file with bedGraphToBigWig example.bg
JBrowse doesn't have any idea what this data intrinsically represents. It could be sequencing coverage per 100bp, it could be gene density per 100bp, etc. But JBrowse doesn't know anything about what it truly "means" (e.g. gene density, sequencing coverage, GWAS p-values, etc) General concept of bigwig zoom levelsPossibly relevant to the discussion here, there is the special behavior of BigWig files where software that creates bigwig files automatically calculates a summarization of the user's data, e.g. it automatically 'bins' the data. This is sometimes referred to zoom levels or bin sizes. People who create bigwig files have very little control over what the zoom levels/bin sizes are, but they essentially calculate the average (mean), min, and max over a series of genomic windows. A single Bigwig file often contains multiple zoom level/bin sizes So for example, with the above bedgraph, a bin might summarize the three rows of that bedgraph into a single entry with mean 200, min 100, max 300. JBrowse will try to auto-pick the most appropriate 'zoom level' based on how far zoomed out you are. The reason to choose a zoom level is to reduce the amount of data downloaded, because downloading the raw data, especially dense e.g. per base data when you are zoomed out to the level of the whole genome, is expensive JBrowse DOES allow users to manually change the bin size/zoom level being accessed from the bigwig file, via the track menu, using the 'Increase resolution' and 'Decrease resolution'. JBrowse 2 does NOT currently report the raw zoom level/bin size that was chosen, but we could consider doing that. Even still, there is a challenge where this notion also combines with the notion of (1) where we still would be challenged to automatically generate a y-axis label. Note 1: The bin sizes often look quite random, they are basically powers of 2 from some base level, you can see an actual list of these here https://github.com/GMOD/bbi-js?tab=readme-ov-file#understanding-scale-and-reductionlevel Note 2: JBrowse allows plotting just the mean, just the max, just the min, or the "whiskers" mode which plots mean, min, and max at the same time in different shades of color. The whiskers will only be shown if you are zoomed out far enough to access a 'binned' zoom level. Note 3: As you can infer from the above, the notion of autogenerated zoom levels can combine with the notion of bin sizes that were used as input for the bigwig file itself in potentially odd ways. E.g. in our (2) example there is a bin size of 100, but those will then be further stretched to whatever autogenerated bin size from above Some conclusionsAll three of these issues sort of combine to make it difficult for JBrowse 2 to automatically report a 'y-axis label'. Ultimately, JBrowse 2 itself does not compute bin sizes. If your major concern is about bigwig file zoom levels, you can a) not use bigwig files. one alternative that jbrowse 2 supports natively is bedgraph for example. bedgraph just contains the raw data, no zoom levels. bedgraph files are larger on disk though, and accessing/drawing dense quantitative data from bedgraph is more cpu/memory/network intensive than bigwig b) use the "increase resolution" feature of JBrowse 2 to access the lowest zoom level/bin size, which will display the raw unbinned data. again, for dense quantitative data, accessing the lowest zoom level will be more cpu/memory/network intensive though. I know that's a lot of info but welcome any ideas :) Some references [1] https://github.com/deeptools/pyBigWig/blob/master/README.md#a-note-on-statistics-and-zoom-levels |
So, taking your response and boiling it down to its key point: JBrowse2 reads whatever data went into the BigWig and will scale according to genomic densities implicit in the primary data which were reformatted into BigWig. In the case of gene densities and repetitive element densities, the clear implication is that the y-axes really do represent average density of genes or repeats per nt, averaged over the automatic binning function graphically invoked by JBrowse2. For mapped Illumina reads (in RNA-seq or SNP mapping) one would expect to see scale bars going up much higher than "1". That's fine because it makes intuitive sense given the data that were put in. I just wanted to make sure that my guess wasn't obviously wildly off. Thanks for your fast and detailed explanation! |
that sounds about right. to make sure it's clear
the 'primary data' (e.g. like the bedgraph or wiggle file that is the source of the bigwig) is put more or less unaltered into the 'lowest' zoom level of the bigwig, and then all the other binning functions build on top of that |
and then, indeed, jbrowse just picks the data and displays that data as is, making the range equivalent to whatever is in the file, so the y-axis data range for deep rna-seq coverage tracks can be like 0-10,000x coverage, or you can have the y-axis range like 0.0-1.0 from your screenshot, or whatever else |
I am using JBrowse2 to visualize chromosomal densities of genes and repetitive DNA regions from BigWig files. The visualization works nicely, and there is a y-axis with numbers given. However, I can find no explanation anywhere in the documentation for JBrowse2 about what the y-axis numbers mean. This is bad; I would like to be able to explain the y-axis units in the figure legend of my manuscript/future paper, but I am reduced to sheer guesswork. I believe the y-axis units are supposed to mean "density of the relevant feature per nucleotide"; but even if that guess is lucikly correct, I have no idea what function (if any) JBrowse2 uses to compute bin sizes by which it computes such a density in a chromsomal-scale view.
I am attaching a screenshot of the sort of JBrowse2 quantitative track visualization that I am working with, which I hope will clarify the problem and inspire somebody to explain the y-axis units. Thank you.
The text was updated successfully, but these errors were encountered: