-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating the test data for tools/cellpose #1495
base: master
Are you sure you want to change the base?
Conversation
For the tests here, should we keep the |
Lets ask @kostrykin - my guess is we should switch to asserts. This would hopefully in less work in the next bot PR. |
Yes, using I'd suggest to use the For the other images, using assertions should be possible, but since you have the expected data generated already, I think using |
@kostrykin Thanks for the pointers. I will replace the tests with |
Using |
As I see from the test results https://github.com/bgruening/galaxytools/actions/runs/10681249964?pr=1495, the issue is that the difference between the images generated by Cellpose on the one hand, and the expected images on the other hand, is simply too large. This is not an issue of testing, but of Cellpose generating unexpected results. The question is, why is that happening? I'm trying to inspect the generated data, but it gets deleted even though I've used |
@kostrykin this behaviour is quite strange. Are you using |
It takes forever to run with Interestingly, Test 1 fails when running with
The output of Cellpose for So we have 3 different outputs, one for the expected data, one for without |
@kostrykin Ideally the tools needs to be tested with I updated the test data with the following command |
I did Test 1 fail to run with the above mentioned error. I attached the other generated results. test-data.tgz Do you obtain the same results when you re-generate them? |
The *.tif files are similar, but the generated PNGs are different from yours. |
I just tested the base branch on my computer. Using For the In consequence, since the base branch version is faster, does not exhibit failing tests, and to better understand what is going on, I suggest the following strategy:
|
@kostrykin I will first migrate the changes of |
I think we keep the |
830bcf8
to
cdd5be1
Compare
@kostrykin I was testing the base branch of cellpose and migrated the You are correct that all the tests of the tool pass when |
Nope, unfortunately, I haven't seen similar issues before. Basically there are two options, I guess: Either a dependency of the Cellpose package is resolved to different versions and this induces the diverging behavior. Or it is some internal switch of Cellpose, that causes the divergence. Are the results consistent when running locally with CPU vs. GPU utilization? Maybe those who implemented/updated the Cellpose wrapper have an idea? @sunyi000 @pavanvidem |
Here are the updated test-data with docker: #1505 |
sorry i have no idea why it's producing different results. |
As we are now using conda requirements, differences maybe due to some dependency. Let me test this. |
What should we do here? Do we assume cellpose knows what they are doing and accept that the outputs have changed so dramatically? We all use biocontainers, hopefully, we should not use conda for testing, locally or remote only biocontainers should be used. |
ping @kostrykin |
@bgruening If I understood this comment correctly, then the problem here is that the Cellpose wrapper for one and the same version of Cellpose produces very different results. This makes me concerned regarding reproducibility. If the wrapper now yields different results when being run locally vs. in CI, it might as well yield inconsistent results in the future when run in Galaxy, as long as we do not understand why the inconsistency happens. @SaimMomin12 You reported the diverging results when (i) using IMO, if and only if the answer to this last question is "yes", we can ignore the inconsistency. |
@pavanvidem Did you find out something? |
I'm not sure we should assume that. The new tests are stricter and we don't know for sure how it was with the old version + old tests, correct? To make it easier for all of us, we should always use
|
I could try rework on this tool.. I was trying to use this in one of my testing workflow, it didn't work. |
After trying out many possible combinations of dependencies, I found out that the discrepancies are most likely coming from Surprisingly, the results from |
Hi, for deploying it in Galaxy, does it have to reproducible in terms of rebuilding the container image, or would be it possible to build the image once, and then never touch it again (assuming that at least the results when running from the same image are reproducible)? |
@tischi that would be possible. But how do you decide which container is "correct"? |
Same result as the CellPose in BAND and BARD? |
As far as I remember from talking to @pavanvidem, the issue is that the results aren't even reproducible when using the same image between the runs (if opencv is installed via Conda, but it works if it is installed via pip). So the assumption is unfortunately wrong. What we need to do here is to build a container for Cellpose where opencv is installed via pip. |
... or find out which compile flag is used for the pip binaries that is not used for conda and produces randomness ... |
This pull request updates the test-data for cellpose tool for an existing PR #1494 and fixes #1494