Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simpler CCC #41

Closed
dkounadis opened this issue Apr 18, 2023 · 7 comments · Fixed by #43
Closed

Simpler CCC #41

dkounadis opened this issue Apr 18, 2023 · 7 comments · Fixed by #43

Comments

@dkounadis
Copy link
Member

dkounadis commented Apr 18, 2023

I have been interested in a fast implementation of CCC where one can ignore values from x, y without deleting elements or modifying the dimension of x, y .

import audmetric
import numpy as np

def simple_cc(x, y, ignore=-1):
    '''concordance correlation coefficient'''
    mask = (y != ignore)
    N = mask.sum()
    mean_y = np.dot(mask, y) / N
    mean_x = np.dot(mask, x) / N
    a = mask * (x - mean_x)
    b = mask * (y - mean_y)
    numerator = 2 * np.dot(a, b)
    denominator = np.dot(a, a) + np.dot(b, b) + (mean_x - mean_y)**2 * N
    return numerator / denominator
     
x = np.random.randn(456)  # predictions
y = np.random.randn(456)  # ground_truth
print(audmetric.concordance_cc(x, y) - simple_cc(x, y))  # 1e-17

ignore = -1  # equivalent to deleting elems from x and y
y[10:400] = ignore
print(audmetric.concordance_cc(x[y != ignore], y[y != ignore]) - simple_cc(x, y))  # 1e-17
@hagenw
Copy link
Member

hagenw commented Apr 18, 2023

You are right the current implementation is slower:

import audmetric
import numpy as np
import time

np.random.seed(1)

samples = 10000000
repetitions = 100

def simple_cc(x, y, ignore=-100):
    '''concordance correlation coefficient'''
    mask = (y != ignore)
    N = mask.sum()
    mean_y = np.dot(mask, y) / N
    mean_x = np.dot(mask, x) / N
    a = mask * (x - mean_x)
    b = mask * (y - mean_y)
    numerator = 2 * np.dot(a, b)
    denominator = np.dot(a, a) + np.dot(b, b) + (mean_x - mean_y)**2 * N
    return numerator / denominator

x = np.random.randn(samples)
y = np.random.randn(samples)

start = time.time()
for n in range(repetitions):
    simple_cc(x, y)
end = time.time()
print(f'simple_cc: {(end - start) / repetitions:.2f} s')

start = time.time()
for n in range(repetitions):
    audmetric.concordance_cc(x, y)
end = time.time()
print(f'audmetric: {(end - start) / repetitions:.2f} s')

returns

simple_cc: 0.28 s
audmetric: 3.88 s

The problematic part are the lines in which we do

prediction = np.array(list(prediction))
truth = np.array(list(truth))

If we replace them by

if not isinstance(prediction, np.ndarray):
    prediction = np.array(list(prediction))
if not isinstance(truth, np.ndarray):
    truth = np.array(list(truth))

the execution time reduces to

audmetric: 0.32 s

When then considering a mask

x[200:20000] = -100
mask = (x != -100)

start = time.time()
for n in range(repetitions):
    simple_cc(x, y)
end = time.time()
print(f'simple_cc masked: {(end - start) / repetitions:.2f} s')

start = time.time()
for n in range(repetitions):
    audmetric.concordance_cc(x[mask], y[mask])
end = time.time()
print(f'audmetric masked: {(end - start) / repetitions:.2f} s')

we are getting

simple_cc masked: 0.26 s
audmetric masked: 0.54 s

where audmetric needs twice as long.

What you can do with audmetric to get the same speed is to use np.NaN instead of a mask:

x[200:20000] = np.NaN
y[200:20000] = np.NaN

start = time.time()
for n in range(repetitions):
    audmetric.concordance_cc(x, y)
end = time.time()
print(f'audmetric masked: {(end - start) / repetitions:.2f} s')

which runs in

audmetric masked: 0.22 s

Summary

  • We should definitely update the code to do the conversion to numpy array only if really needed
  • The ignore argument feels a little bit risky to me as in your example the original data could contain already values at -1, so maybe setting it to NaN might be the better choice instead of applying a mask

@frankenjoe
Copy link
Collaborator

frankenjoe commented Apr 18, 2023

  • The ignore argument feels a little bit risky to me as in your example the original data could contain already values at -1, so maybe setting it to NaN might be the better choice instead of applying a mask

Yes, that's what also bugs me. Since -1 might be a valid value, we should use NaN to mask. Then we can also remove the ignore argument.

@hagenw
Copy link
Member

hagenw commented May 2, 2023

I was wrong in stating above that you can use NaN to mask values as the returned CCC is simply NaN in that case.

Which means audmetric.concordance_cc() can indeed be twice as fast when the masking is applied inside the function instead of outside. The question is if this feature is needed that often that we should consider adding a mask argument so you can pass on a mask or if we simply stay with the current approach. When adding a mask, we might also have to consider adding it to other functions of audmetric as a user expects that they behavior similar.

@dkounadis what is the use case for having a mask applied before calculating the CCC?

@frankenjoe
Copy link
Collaborator

The question is if this feature is needed that often that we should consider adding a mask argument so you can pass on a mask or if we simply stay with the current approach

I don't see why we need a mask argument. Can't we simply ignore entries where either truth or prediction is NaN when calculating the CCC? I also don't think we should introduce some complicated solution just because of speed - calculating CCC should not be a bottleneck.

@hagenw
Copy link
Member

hagenw commented May 2, 2023

OK, it also seems a little bit risky by just ignoring NaN values instead of returning NaN as result in this case, but we could add an ignore_nan argument to audmetric.concordance_cc() and expand it also to other functions in order to tackle #14.

@frankenjoe
Copy link
Collaborator

Yes, that sounds like a good solution to me

@dkounadis
Copy link
Member Author

True, a mask argument seems confusing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants