Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ignore_nan argument to concordance_cc() #43

Merged
merged 15 commits into from
May 22, 2023
Merged

Conversation

hagenw
Copy link
Member

@hagenw hagenw commented May 3, 2023

Closes #41

Relates to #14

Adds ignore_nan=False argument to audmetric.concordance_cc(). If True all samples are ignored that contain NaN as part of truth or prediction.

It further uses the proposed implementation from #41 to speed up the calculation of CCC compared to the current main branch. Using the code mention at the end of this repo we get:

branch ignore_nan execution time
main - 0.96 s
ccc-ignore-nan False 0.32 s
ccc-ignore-nan True 0.65 s

image


import audmetric
import numpy as np
import time

np.random.seed(1)

samples = 10000000
repetitions = 100

x = np.random.randn(samples)
y = np.random.randn(samples)

start = time.time()
for n in range(repetitions):
    audmetric.concordance_cc(x, y)
end = time.time()
print(f'audmetric: {(end - start) / repetitions:.2f} s')

and for ignore_nan=True:

y[0:20000] = np.NaN

start = time.time()
for n in range(repetitions):
    audmetric.concordance_cc(x, y, ignore_nan=True)
end = time.time()
print(f'audmetric: {(end - start) / repetitions:.2f} s')

@hagenw hagenw marked this pull request as draft May 3, 2023 08:52
@codecov
Copy link

codecov bot commented May 3, 2023

Codecov Report

Merging #43 (bf55c58) into main (70e5362) will not change coverage.
The diff coverage is 100.0%.

Impacted Files Coverage Δ
audmetric/core/api.py 100.0% <100.0%> (ø)

@hagenw hagenw marked this pull request as ready for review May 3, 2023 13:45
@hagenw
Copy link
Member Author

hagenw commented May 3, 2023

@dkounadis your implementation proposed in #41 is indeed faster and returns the same results. What I did not completely understood yet is, why could you simply skip the pearson_cc(truth, prediction) step?

@dkounadis
Copy link
Member

dkounadis commented May 4, 2023

@hagenw
Copy link
Member Author

hagenw commented May 4, 2023

Cool thanks for that information. This should be documentation enough, if we need to figure this out again.

@hagenw hagenw requested a review from frankenjoe May 4, 2023 13:51
@frankenjoe
Copy link
Collaborator

So after this PR, we should probably have another one that adds ignore_nan to pearson(), right?

@hagenw
Copy link
Member Author

hagenw commented May 5, 2023

Don't know how urgent it is. I think we have to revisit the handling of NaN in all our functions besides concordance_cc().
As this might take some time, I'm also fine with just doing a new release after this merge request.

audmetric/core/api.py Outdated Show resolved Hide resolved
audmetric/core/api.py Outdated Show resolved Hide resolved
Co-authored-by: Johannes Wagner <[email protected]>
@frankenjoe
Copy link
Collaborator

As this might take some time, I'm also fine with just doing a new release after this merge request.

Seems a bit strange to me to only support it with one particular function.

@hagenw
Copy link
Member Author

hagenw commented May 5, 2023

As this might take some time, I'm also fine with just doing a new release after this merge request.

Seems a bit strange to me to only support it with one particular function.

Feel free to implement it ;)

I just had the impression that nobody was asking for it, whereas for concordance_cc() there was a request to support it and make it as fast as possible.

@hagenw
Copy link
Member Author

hagenw commented May 5, 2023

As this might take some time, I'm also fine with just doing a new release after this merge request.

Seems a bit strange to me to only support it with one particular function.

There is also still a corresponding issue with #14, so it's a known fact. If you like we can extend that issue or open another one for pearson_cc()

@frankenjoe
Copy link
Collaborator

frankenjoe commented May 5, 2023

Feel free to implement it ;)

What I would propose is to simply add the following to all our functions:

if ignore_nan:
    mask = ~(np.isnan(truth) + np.isnan(prediction))
    truth = truth[mask]
    prediction = prediction[mask]

@hagenw hagenw mentioned this pull request May 8, 2023
tests/test_api.py Outdated Show resolved Hide resolved
tests/test_api.py Outdated Show resolved Hide resolved
False,
),
(
[0, 1, 2, 3, 4, 5, 6, np.NaN],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think in addition we should also add cases where np.NaN is in either truth or prediction and in both, but different locations.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated the tests and added now an additional test for different np.NaN locations and the possibility to specify the expected truth and prediction values after the mask is applied to avoid using the same code for masking in the test and the implementation.

prediction = np.array(list(prediction))
truth = np.array(list(truth))

if len(prediction) < 2:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we actually need those special cases where we return np.NaN or can we simplify the function now?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, forgot to remove this. We don't need this and it is now removed.

@frankenjoe frankenjoe merged commit 8a016ae into main May 22, 2023
@frankenjoe frankenjoe deleted the ccc-ignore-nan branch May 22, 2023 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Simpler CCC
3 participants