Add ignore_nan argument to concordance_cc() #43

hagenw · 2023-05-03T08:51:51Z

Closes #41

Relates to #14

Adds ignore_nan=False argument to audmetric.concordance_cc(). If True all samples are ignored that contain NaN as part of truth or prediction.

It further uses the proposed implementation from #41 to speed up the calculation of CCC compared to the current main branch. Using the code mention at the end of this repo we get:

branch	ignore_nan	execution time
main	-	0.96 s
ccc-ignore-nan	False	0.32 s
ccc-ignore-nan	True	0.65 s

import audmetric
import numpy as np
import time

np.random.seed(1)

samples = 10000000
repetitions = 100

x = np.random.randn(samples)
y = np.random.randn(samples)

start = time.time()
for n in range(repetitions):
    audmetric.concordance_cc(x, y)
end = time.time()
print(f'audmetric: {(end - start) / repetitions:.2f} s')

and for ignore_nan=True:

y[0:20000] = np.NaN

start = time.time()
for n in range(repetitions):
    audmetric.concordance_cc(x, y, ignore_nan=True)
end = time.time()
print(f'audmetric: {(end - start) / repetitions:.2f} s')

codecov · 2023-05-03T11:46:04Z

Codecov Report

Merging #43 (bf55c58) into main (70e5362) will not change coverage.
The diff coverage is 100.0%.

Impacted Files	Coverage Δ
audmetric/core/api.py	`100.0% <100.0%> (ø)`

hagenw · 2023-05-03T13:46:18Z

@dkounadis your implementation proposed in #41 is indeed faster and returns the same results. What I did not completely understood yet is, why could you simply skip the pearson_cc(truth, prediction) step?

dkounadis · 2023-05-04T13:44:32Z

I used the expression from Wikipedia: When the correlation coefficient is computed on a N-length data set (i.e, ...

hagenw · 2023-05-04T13:51:25Z

Cool thanks for that information. This should be documentation enough, if we need to figure this out again.

frankenjoe · 2023-05-05T12:56:35Z

So after this PR, we should probably have another one that adds ignore_nan to pearson(), right?

hagenw · 2023-05-05T12:59:06Z

Don't know how urgent it is. I think we have to revisit the handling of NaN in all our functions besides concordance_cc().
As this might take some time, I'm also fine with just doing a new release after this merge request.

audmetric/core/api.py

Co-authored-by: Johannes Wagner <[email protected]>

frankenjoe · 2023-05-05T13:27:23Z

As this might take some time, I'm also fine with just doing a new release after this merge request.

Seems a bit strange to me to only support it with one particular function.

hagenw · 2023-05-05T13:29:02Z

As this might take some time, I'm also fine with just doing a new release after this merge request.

Seems a bit strange to me to only support it with one particular function.

Feel free to implement it ;)

I just had the impression that nobody was asking for it, whereas for concordance_cc() there was a request to support it and make it as fast as possible.

hagenw · 2023-05-05T13:30:23Z

As this might take some time, I'm also fine with just doing a new release after this merge request.

Seems a bit strange to me to only support it with one particular function.

There is also still a corresponding issue with #14, so it's a known fact. If you like we can extend that issue or open another one for pearson_cc()

frankenjoe · 2023-05-05T13:31:50Z

Feel free to implement it ;)

What I would propose is to simply add the following to all our functions:

if ignore_nan:
    mask = ~(np.isnan(truth) + np.isnan(prediction))
    truth = truth[mask]
    prediction = prediction[mask]

tests/test_api.py

frankenjoe · 2023-05-11T12:07:54Z

tests/test_concordance_cc.py

+            False,
+        ),
+        (
+            [0, 1, 2, 3, 4, 5, 6, np.NaN],


I think in addition we should also add cases where np.NaN is in either truth or prediction and in both, but different locations.

I updated the tests and added now an additional test for different np.NaN locations and the possibility to specify the expected truth and prediction values after the mask is applied to avoid using the same code for masking in the test and the implementation.

tests/test_concordance_cc.py

frankenjoe · 2023-05-22T13:14:04Z

tests/test_concordance_cc.py

+    prediction = np.array(list(prediction))
+    truth = np.array(list(truth))
+
+    if len(prediction) < 2:


Do we actually need those special cases where we return np.NaN or can we simplify the function now?

Sorry, forgot to remove this. We don't need this and it is now removed.

hagenw added 3 commits May 3, 2023 09:57

Add ignore_nan argument to concordance_cc()

e260b6c

Update implementation

e6319d6

Add tests

294abb5

hagenw marked this pull request as draft May 3, 2023 08:52

Replace NaN values by 0

f157132

hagenw added 3 commits May 3, 2023 14:16

Make implementation faster

380377f

Fix error

6ade721

Consider also truth

bbf53f8

hagenw marked this pull request as ready for review May 3, 2023 13:45

hagenw requested a review from frankenjoe May 4, 2023 13:51

frankenjoe reviewed May 5, 2023

View reviewed changes

audmetric/core/api.py Outdated Show resolved Hide resolved

frankenjoe reviewed May 5, 2023

View reviewed changes

audmetric/core/api.py Outdated Show resolved Hide resolved

Update audmetric/core/api.py

7e07356

Co-authored-by: Johannes Wagner <[email protected]>

hagenw mentioned this pull request May 8, 2023

Release 1.2.0 #44

Merged

frankenjoe reviewed May 11, 2023

View reviewed changes

tests/test_api.py Outdated Show resolved Hide resolved

frankenjoe reviewed May 11, 2023

View reviewed changes

tests/test_api.py Outdated Show resolved Hide resolved

hagenw added 2 commits May 11, 2023 11:32

Add extra test function

86de2d4

Update handling of NaN

5e170d8

frankenjoe reviewed May 11, 2023

View reviewed changes

Add more tests

8136e95

frankenjoe reviewed May 11, 2023

View reviewed changes

tests/test_concordance_cc.py Outdated Show resolved Hide resolved

hagenw added 2 commits May 22, 2023 14:47

Use one expected

81b7cb5

Remove extra NaN test

f21d879

frankenjoe reviewed May 22, 2023

View reviewed changes

hagenw added 2 commits May 22, 2023 15:15

Simplify expected function

1f57ef0

Adjust comment

bf55c58

frankenjoe merged commit 8a016ae into main May 22, 2023

frankenjoe deleted the ccc-ignore-nan branch May 22, 2023 13:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add ignore_nan argument to concordance_cc() #43

Add ignore_nan argument to concordance_cc() #43

hagenw commented May 3, 2023 •

edited

Loading

codecov bot commented May 3, 2023 •

edited

Loading

hagenw commented May 3, 2023

dkounadis commented May 4, 2023 •

edited

Loading

hagenw commented May 4, 2023

frankenjoe commented May 5, 2023

hagenw commented May 5, 2023

frankenjoe commented May 5, 2023

hagenw commented May 5, 2023

hagenw commented May 5, 2023

frankenjoe commented May 5, 2023 •

edited

Loading

frankenjoe May 11, 2023

hagenw May 11, 2023

frankenjoe May 22, 2023

hagenw May 22, 2023

Add ignore_nan argument to concordance_cc() #43

Add ignore_nan argument to concordance_cc() #43

Conversation

hagenw commented May 3, 2023 • edited Loading

codecov bot commented May 3, 2023 • edited Loading

Codecov Report

hagenw commented May 3, 2023

dkounadis commented May 4, 2023 • edited Loading

hagenw commented May 4, 2023

frankenjoe commented May 5, 2023

hagenw commented May 5, 2023

frankenjoe commented May 5, 2023

hagenw commented May 5, 2023

hagenw commented May 5, 2023

frankenjoe commented May 5, 2023 • edited Loading

frankenjoe May 11, 2023

Choose a reason for hiding this comment

hagenw May 11, 2023

Choose a reason for hiding this comment

frankenjoe May 22, 2023

Choose a reason for hiding this comment

hagenw May 22, 2023

Choose a reason for hiding this comment

hagenw commented May 3, 2023 •

edited

Loading

codecov bot commented May 3, 2023 •

edited

Loading

dkounadis commented May 4, 2023 •

edited

Loading

frankenjoe commented May 5, 2023 •

edited

Loading