Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No explanation of the y-axis scale bar when visualizing a quantitative track from BigWig files #4748

Open
SchwarzEM opened this issue Jan 3, 2025 · 4 comments
Labels
bug Something isn't working

Comments

@SchwarzEM
Copy link

I am using JBrowse2 to visualize chromosomal densities of genes and repetitive DNA regions from BigWig files. The visualization works nicely, and there is a y-axis with numbers given. However, I can find no explanation anywhere in the documentation for JBrowse2 about what the y-axis numbers mean. This is bad; I would like to be able to explain the y-axis units in the figure legend of my manuscript/future paper, but I am reduced to sheer guesswork. I believe the y-axis units are supposed to mean "density of the relevant feature per nucleotide"; but even if that guess is lucikly correct, I have no idea what function (if any) JBrowse2 uses to compute bin sizes by which it computes such a density in a chromsomal-scale view.

I am attaching a screenshot of the sort of JBrowse2 quantitative track visualization that I am working with, which I hope will clarify the problem and inspire somebody to explain the y-axis units. Thank you.

Sherm_chrI

@SchwarzEM SchwarzEM added the bug Something isn't working label Jan 3, 2025
@cmdcolin
Copy link
Collaborator

cmdcolin commented Jan 3, 2025

there are a couple different factors at play

  1. the general concept of y-axis labels
  2. the general concept of what bigwig files represent
  3. the general concept of "zoom levels" in bigwig files

1. General concept of y-axis labels

This could be something we allow users to configure or something, similar to ylab('...') in R, etc.

However, automatically generating a y axis label is sort of complex due to (2) and (3)

2. General concept of what bigwig files represent

BigWig files represent whatever quantitative signal a user originally created and converted into a bigwig format. For example...you can convert the below bedgraph into a bigwig file with bedGraphToBigWig

example.bg

#chrom start end score
chr1 1 100 100
chr1 101 200 200
chr1 201 300 300

JBrowse doesn't have any idea what this data intrinsically represents. It could be sequencing coverage per 100bp, it could be gene density per 100bp, etc. But JBrowse doesn't know anything about what it truly "means" (e.g. gene density, sequencing coverage, GWAS p-values, etc)

General concept of bigwig zoom levels

Possibly relevant to the discussion here, there is the special behavior of BigWig files where software that creates bigwig files automatically calculates a summarization of the user's data, e.g. it automatically 'bins' the data. This is sometimes referred to zoom levels or bin sizes.

People who create bigwig files have very little control over what the zoom levels/bin sizes are, but they essentially calculate the average (mean), min, and max over a series of genomic windows. A single Bigwig file often contains multiple zoom level/bin sizes

So for example, with the above bedgraph, a bin might summarize the three rows of that bedgraph into a single entry with mean 200, min 100, max 300.

JBrowse will try to auto-pick the most appropriate 'zoom level' based on how far zoomed out you are. The reason to choose a zoom level is to reduce the amount of data downloaded, because downloading the raw data, especially dense e.g. per base data when you are zoomed out to the level of the whole genome, is expensive

JBrowse DOES allow users to manually change the bin size/zoom level being accessed from the bigwig file, via the track menu, using the 'Increase resolution' and 'Decrease resolution'.

JBrowse 2 does NOT currently report the raw zoom level/bin size that was chosen, but we could consider doing that. Even still, there is a challenge where this notion also combines with the notion of (1) where we still would be challenged to automatically generate a y-axis label.

Note 1: The bin sizes often look quite random, they are basically powers of 2 from some base level, you can see an actual list of these here https://github.com/GMOD/bbi-js?tab=readme-ov-file#understanding-scale-and-reductionlevel

Note 2: JBrowse allows plotting just the mean, just the max, just the min, or the "whiskers" mode which plots mean, min, and max at the same time in different shades of color. The whiskers will only be shown if you are zoomed out far enough to access a 'binned' zoom level.

Note 3: As you can infer from the above, the notion of autogenerated zoom levels can combine with the notion of bin sizes that were used as input for the bigwig file itself in potentially odd ways. E.g. in our (2) example there is a bin size of 100, but those will then be further stretched to whatever autogenerated bin size from above

Some conclusions

All three of these issues sort of combine to make it difficult for JBrowse 2 to automatically report a 'y-axis label'. Ultimately, JBrowse 2 itself does not compute bin sizes.

If your major concern is about bigwig file zoom levels, you can

a) not use bigwig files. one alternative that jbrowse 2 supports natively is bedgraph for example. bedgraph just contains the raw data, no zoom levels. bedgraph files are larger on disk though, and accessing/drawing dense quantitative data from bedgraph is more cpu/memory/network intensive than bigwig

b) use the "increase resolution" feature of JBrowse 2 to access the lowest zoom level/bin size, which will display the raw unbinned data. again, for dense quantitative data, accessing the lowest zoom level will be more cpu/memory/network intensive though.

I know that's a lot of info but welcome any ideas :)

Some references

[1] https://github.com/deeptools/pyBigWig/blob/master/README.md#a-note-on-statistics-and-zoom-levels
[2] https://github.com/GMOD/bbi-js
[3] GMOD/jbrowse#1654 (comment)

@SchwarzEM
Copy link
Author

So, taking your response and boiling it down to its key point: JBrowse2 reads whatever data went into the BigWig and will scale according to genomic densities implicit in the primary data which were reformatted into BigWig. In the case of gene densities and repetitive element densities, the clear implication is that the y-axes really do represent average density of genes or repeats per nt, averaged over the automatic binning function graphically invoked by JBrowse2. For mapped Illumina reads (in RNA-seq or SNP mapping) one would expect to see scale bars going up much higher than "1".

That's fine because it makes intuitive sense given the data that were put in. I just wanted to make sure that my guess wasn't obviously wildly off.

Thanks for your fast and detailed explanation!

@cmdcolin
Copy link
Collaborator

cmdcolin commented Jan 4, 2025

that sounds about right. to make sure it's clear

implicit in the primary data which were reformatted into BigWig

the 'primary data' (e.g. like the bedgraph or wiggle file that is the source of the bigwig) is put more or less unaltered into the 'lowest' zoom level of the bigwig, and then all the other binning functions build on top of that

@cmdcolin
Copy link
Collaborator

cmdcolin commented Jan 4, 2025

and then, indeed, jbrowse just picks the data and displays that data as is, making the range equivalent to whatever is in the file, so the y-axis data range for deep rna-seq coverage tracks can be like 0-10,000x coverage, or you can have the y-axis range like 0.0-1.0 from your screenshot, or whatever else

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants