Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create visualization for haploid variant analysis workflow results #155

Open
nekrut opened this issue Oct 31, 2024 · 7 comments
Open

Create visualization for haploid variant analysis workflow results #155

nekrut opened this issue Oct 31, 2024 · 7 comments
Assignees

Comments

@nekrut
Copy link
Contributor

nekrut commented Oct 31, 2024

Given these data https://usegalaxy.org/api/datasets/f9cad7b01a472135d0cbdeeffd6c9a1e/display?to_ext=tabular create an obervabel notebook that displaying various graphs including:

  • variants versus chromosomal positions
  • variants versus geographic locations

The notebook should allow dynamic filtering by various attributes such as, for example, variant effect

@nekrut nekrut converted this from a draft issue Oct 31, 2024
@d-callan
Copy link
Collaborator

image

@d-callan
Copy link
Collaborator

Took a look quickly at the attached ex data today, and notice there doesn't appear to be any run info in there. How do we mean to grab things like geographic location?

@nekrut
Copy link
Contributor Author

nekrut commented Dec 5, 2024

SRR Ids

@d-callan
Copy link
Collaborator

d-callan commented Dec 5, 2024

well.. yes. i more was interested in whether i should assume one data input or two, or if the run info contains something like country names should we attempt to turn those into GIS coords, or if there isnt anything relating to location at all what wed like to do then? run info isnt very controlled is my understanding, in terms of allowed values etc..

or i suppose, if we should even assume the ids will be srr ids?

@d-callan d-callan self-assigned this Dec 6, 2024
@d-callan
Copy link
Collaborator

d-callan commented Dec 6, 2024

mmkay so i quickly put together a pretty rough demo of an idea i had, and will include a screenshot here and some of my rationale and thoughts on possible directions this could go. feel free to throw tomatoes hahaha

rationale

i find most 'traditional' views of large numbers of variants across large numbers of samples kind of visually overwhelming. like theyre throwing data at you rather than presenting information, if that makes sense. i wanted to try to think of a thing that might make the information more digestable, enable discovery of high-level patterns more easily. if we prefer traditional, on the grounds its maybe what people will be expecting or something, let me know.

caveats/ growth areas

this, being a rough ex i made quickly, has a lot of room for growth. some things that would make it a lot better:

  1. actually labelling things lol, general cleaning up
  2. make it interactive/ request user inputs where ive currently set some arbitrary defaults
  3. grabbing run info for samples and sorting/ faceting samples by those data
  4. being able to click a legend entry to toggle visibility of those values
  5. being able to toggle on/off the presence of loci without variants
  6. figuring out how to accessibly color edges where there are a large number of categorical values (its hard to make color-deficient palettes w more than ~10 distinct colors)
  7. allowing the selection of a particular locus and generating summary views for that position alone (like a geographic map, when possible)
    EDIT: i left out a few points here
  8. possibly adding a user input for selecting genes by name or something and showing only relevant loci
  9. adding tooltips or something that provides further info about edges/ variants like ref/ alt, gene
  10. trying to grab annotation for genes from somewhere and adding that any place we display gene identifier

explainer/ demo

the left column are loci that have variants. from the data @nekrut links, i arbitrarily chose two samples, a chromosome and included the first 10,000 positions from it for now as a demo. choosing chromosomes and positions to view would need inputs, with some reasonable restrictions probably. right column nodes are samples. edges are variants which can be colored by various attributes
variants_viz_idea

realistic example

this is the same data as the mini demo but without subsetting to just 2 samples
variants_viz_busy
variants_viz_busy2

@d-callan
Copy link
Collaborator

d-callan commented Dec 6, 2024

@d-callan
Copy link
Collaborator

d-callan commented Dec 9, 2024

looked at this again just now to see how i felt about it after having walked away a bit. i still think i personally like it, but realize i didnt really finish my rationale before. so... color and position are precognitive attributes, and given the density of data here thought relying on them as heavily as possible would help people to more quickly/ effectively identify patterns. representing this data as a network diagram lets us do things like 'physically' co-locate data from all samples for a particular locus.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

No branches or pull requests

2 participants