Create visualization for haploid variant analysis workflow results #155

nekrut · 2024-10-31T15:47:08Z

Given these data https://usegalaxy.org/api/datasets/f9cad7b01a472135d0cbdeeffd6c9a1e/display?to_ext=tabular create an obervabel notebook that displaying various graphs including:

variants versus chromosomal positions
variants versus geographic locations

The notebook should allow dynamic filtering by various attributes such as, for example, variant effect

d-callan · 2024-11-20T15:42:16Z

d-callan · 2024-11-25T21:39:15Z

Took a look quickly at the attached ex data today, and notice there doesn't appear to be any run info in there. How do we mean to grab things like geographic location?

nekrut · 2024-12-05T14:57:25Z

SRR Ids

d-callan · 2024-12-05T15:34:56Z

well.. yes. i more was interested in whether i should assume one data input or two, or if the run info contains something like country names should we attempt to turn those into GIS coords, or if there isnt anything relating to location at all what wed like to do then? run info isnt very controlled is my understanding, in terms of allowed values etc..

or i suppose, if we should even assume the ids will be srr ids?

d-callan · 2024-12-06T17:18:36Z

mmkay so i quickly put together a pretty rough demo of an idea i had, and will include a screenshot here and some of my rationale and thoughts on possible directions this could go. feel free to throw tomatoes hahaha

rationale

i find most 'traditional' views of large numbers of variants across large numbers of samples kind of visually overwhelming. like theyre throwing data at you rather than presenting information, if that makes sense. i wanted to try to think of a thing that might make the information more digestable, enable discovery of high-level patterns more easily. if we prefer traditional, on the grounds its maybe what people will be expecting or something, let me know.

caveats/ growth areas

this, being a rough ex i made quickly, has a lot of room for growth. some things that would make it a lot better:

actually labelling things lol, general cleaning up
make it interactive/ request user inputs where ive currently set some arbitrary defaults
grabbing run info for samples and sorting/ faceting samples by those data
being able to click a legend entry to toggle visibility of those values
being able to toggle on/off the presence of loci without variants
figuring out how to accessibly color edges where there are a large number of categorical values (its hard to make color-deficient palettes w more than ~10 distinct colors)
allowing the selection of a particular locus and generating summary views for that position alone (like a geographic map, when possible)
EDIT: i left out a few points here
possibly adding a user input for selecting genes by name or something and showing only relevant loci
adding tooltips or something that provides further info about edges/ variants like ref/ alt, gene
trying to grab annotation for genes from somewhere and adding that any place we display gene identifier

explainer/ demo

the left column are loci that have variants. from the data @nekrut links, i arbitrarily chose two samples, a chromosome and included the first 10,000 positions from it for now as a demo. choosing chromosomes and positions to view would need inputs, with some reasonable restrictions probably. right column nodes are samples. edges are variants which can be colored by various attributes

realistic example

this is the same data as the mini demo but without subsetting to just 2 samples

d-callan · 2024-12-06T17:53:24Z

link to live demo
link to code

d-callan · 2024-12-09T15:03:59Z

looked at this again just now to see how i felt about it after having walked away a bit. i still think i personally like it, but realize i didnt really finish my rationale before. so... color and position are precognitive attributes, and given the density of data here thought relying on them as heavily as possible would help people to more quickly/ effectively identify patterns. representing this data as a network diagram lets us do things like 'physically' co-locate data from all samples for a particular locus.

nekrut added this to BRC development tasks Oct 31, 2024

nekrut converted this from a draft issue Oct 31, 2024

nekrut mentioned this issue Nov 14, 2024

Workflows for "Analysis" page #144

Open

d-callan self-assigned this Dec 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create visualization for haploid variant analysis workflow results #155

Create visualization for haploid variant analysis workflow results #155

nekrut commented Oct 31, 2024 •

edited

Loading

d-callan commented Nov 20, 2024

d-callan commented Nov 25, 2024

nekrut commented Dec 5, 2024

d-callan commented Dec 5, 2024 •

edited

Loading

d-callan commented Dec 6, 2024 •

edited

Loading

d-callan commented Dec 6, 2024

d-callan commented Dec 9, 2024

Create visualization for haploid variant analysis workflow results #155

Create visualization for haploid variant analysis workflow results #155

Comments

nekrut commented Oct 31, 2024 • edited Loading

d-callan commented Nov 20, 2024

d-callan commented Nov 25, 2024

nekrut commented Dec 5, 2024

d-callan commented Dec 5, 2024 • edited Loading

d-callan commented Dec 6, 2024 • edited Loading

rationale

caveats/ growth areas

explainer/ demo

realistic example

d-callan commented Dec 6, 2024

d-callan commented Dec 9, 2024

nekrut commented Oct 31, 2024 •

edited

Loading

d-callan commented Dec 5, 2024 •

edited

Loading

d-callan commented Dec 6, 2024 •

edited

Loading