-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reliable timepoint multiplicity calculation #130
Comments
To be clear here, after digging into the mechanics of how Vega does these pie chart transforms, we realized that if even one the nodes has timepoint multiplicities which don't add up to the total multiplicity, it could potentially mess up all the other pie charts. So it's worth making sure that the numbers going in here are correct, or doing some kind of validation along the way. |
@eharkins I think this has become a little more important now that we are generalizing things. Are we still taking totals at their face value, or are we dynamically adding them up based on the individuals timepoints data? |
@metasoarous
you mean dividing each timepoint multiplicity by the The code that creates these values (https://github.com/matsengrp/cft/blob/master/bin/process_partis.py#L122-L135) seems to support the assumption that these parts add up to the whole in the cft case. Because of the filtering done here: https://github.com/matsengrp/olmsted/blob/master/src/components/explorer/vega/clonal_family_details.js#L218-L223 (see comment explaining the filter), I don't think it is the case anymore that
as you say above. Will all this in mind, is it worth doing this anyway or somehow requiring in the schema (if possible) or in the Olmsted code that the individual timepoint multiplicities add up to the total for each sequence? |
I don't see how the filtering you mention implies that we don't have to worry about the cluster multiplicities adding up anymore. Can you explain that please? Assuming there's still any way a user could put inconsistent data into Olmsted, we should either have a check that the numbers match up, or compute the totals ourselves, so nothing ever ends up being off. |
I think you are right that this is does not affect that issue. It just means we can accept some leaves not having multiplicity data.
Sounds good. One way to do things is to not include total multiplicity values and just
by summing the timepoint multiplicities in Olmsted front end code somewhere or in bin/process_data.py. This makes me wonder what other derived data we have in the schema that are liable to be inconsistent with their sources and may need a similar check. |
That's not a bad idea; However, we should not require that information be passed in this way. If folks don't have/want to specify timepoints, they should be able to just specify the regular/aggregated multiplicity values verbatim. We should only compute/check them when needed. |
Right now we assume that a given sequence's multiplicity represents the sum of the timepoint multiplicities by calculating their percentages for pie charts like this https://github.com/matsengrp/olmsted/blob/master/src/components/explorer/vega/clonal_family_details.js#L232.
As is noted above that line, we thought it might be better to sum over the total timepoint multiplicities because of how the pie chart marks work, so that we would have certainty that our percentages were accurate. The pie charts are displaying just fine for now so we have closed the issue for that, but this could be important depending on what data looks like.
The text was updated successfully, but these errors were encountered: