-
Notifications
You must be signed in to change notification settings - Fork 184
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Corrected cubical cover computation #242
base: master
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## master #242 +/- ##
==========================================
- Coverage 80.32% 80.18% -0.14%
==========================================
Files 11 11
Lines 864 858 -6
Branches 189 195 +6
==========================================
- Hits 694 688 -6
Misses 138 138
Partials 32 32
📣 Codecov can now indicate which changes are the most critical in Pull Requests. Learn more |
On a side note, I'm not sure what the correct procedure is for sending in a fix like this. Should I be opening issues before sending the pull request? |
@sauln can you review this? |
I am not a smart man, let alone an expert with TDA. So this question comes from ignorance -- is your definition/expected outcome literature-based? As far as I can tell, kmapper's implementation does work, given its approach to calculating perc_overlap and the like. Whether its approach is contrary to the field's definition is worth discussing though. The kmapper approach -- roughly bc I'm on mobile but I worked this out again on a napkin just now -- is to first lay out centers without regard to any cube overlap. The centers initially have radii from them that make them completely cover the range, without overlap, and up to the range bounds. After the radius is calculated, perc_overlap is applied to the diameter to see how much further to extend each radius. The idea being, "a perc_overlap of .50 means that the radius of cube_1 extends halfway into the next (unextended) cube." That may be the crux of the problem you are calling out -- that the approach should consider that the other cubes are also being extended. Is that right? But that's why the cube in that test extends into -.25 -- because the cube starts at center .25 with an unexpected radius of .25, and 50% overlap applied to its diameter means it gets another .25 on each end. That it goes beyond the bound is ignored, since centers never move w.r.t overlap, as kmapper does it. But @sauln will have better insight. |
Sorry saying "implemented incorrectly" might have been a bit strong on my part. The implementation is correct in the sense that it gives you a cover of your range with n-cubes which do overlap by the specified percentage. I would argue the current behavior is confusing for a handful of reasons. First it seems to disagree with the documentation. from kmapper import Cover
import numpy as np
limits = np.array([[0,1]])
data = np.array([[0,0]])
cover = Cover(n_cubes=2, perc_overlap=0.5, limits = limits)
cover.fit(data) This will again create the cover Second, it disagrees with every other mapper software I've ever used:
Third, it half disagrees with the original mapper paper "Topological Methods for the Analysis of High Dimensional Data Sets and 3D Object Recognition". Although the paper makes no explicit mention of constructing a cover in this manner example 3.1 constructs a cover of the range [0,2] with length one intervals and 2/3 overlap. The cover produced is { [0,1], [1/3, 4/3], [2/3, 5/3], [1, 2] }. This is the closest I have to a literature based reason for expecting the outcome I do. If you were to try and reproduce this cover in kepler with Fourth, as a user I would not expect the percent overlap to impact a single interval cover. I would expect the So at the very least I believe if this behavior is to be kept it needs to be explicitly outlined in the documentation as it is not what I assumed kepler mapper was doing and I suspect others familiar with the space would be equally confused. |
Thank you for the very, detailed, well-reasoned, and well-sourced answer! I'll let @sauln respond next. And I don't think we have a formal policy yet about "open an issue first, and then a PR." |
Previously cubical cover over counted the number of intersections. This commit corrects the overcounting and updates tests to check for the correct behavior.
The cubical cover computation is implemented incorrectly. It starts with the tests having the wrong expected behavior here:
kepler-mapper/test/test_coverer.py
Lines 61 to 66 in 05adb8f
If you cover the interval
[0,1]
with 2 cubes with 50% overlape the cover you should get is[0,2/3]
and[1/3, 1]
. The cover[0, .75]
[0.25, 1]
has a percent overlap of(0.75 - 0.25)/0.75 = 0.5/0.75 = 1/3
. It is worth noting that this test actually generates the cover[-0.25, 0.75]
[0.25, 1.25]
which does have the correct overlap but has increased the range of the cover.The source of the error is this line:
kepler-mapper/kmapper/cover.py
Lines 190 to 192 in 05adb8f
There is an off-by-one error causing kmapper to count an intersection which doesn't exist. The computation should be
ranges / ( 2 * (n _cubes - (n_cubes - 1) * perc_overlap)
.There are other tests which enforce this incorrect behavior as well.
kepler-mapper/test/test_coverer.py
Lines 124 to 133 in 05adb8f
This test in particular asserts that the cover should behave strangely. The first two test cases imply that a single element cover should be impacted by the overlap percentage chosen. While the last test is basically catching the fact that the current implementation has a division by 0.
This pull request updates the tests to check for the correct behavior and updates the implementation to pass those tests. It also skips this test
kepler-mapper/test/test_coverer.py
Lines 143 to 171 in 05adb8f
as what it is expecting doesn't always happen with a correct implementation. I skip it instead of deleting the test outright as the core idea behind the test may be salvageable.