Simpler CCC #41

dkounadis · 2023-04-18T11:20:36Z

I have been interested in a fast implementation of CCC where one can ignore values from x, y without deleting elements or modifying the dimension of x, y .

import audmetric
import numpy as np

def simple_cc(x, y, ignore=-1):
    '''concordance correlation coefficient'''
    mask = (y != ignore)
    N = mask.sum()
    mean_y = np.dot(mask, y) / N
    mean_x = np.dot(mask, x) / N
    a = mask * (x - mean_x)
    b = mask * (y - mean_y)
    numerator = 2 * np.dot(a, b)
    denominator = np.dot(a, a) + np.dot(b, b) + (mean_x - mean_y)**2 * N
    return numerator / denominator
     
x = np.random.randn(456)  # predictions
y = np.random.randn(456)  # ground_truth
print(audmetric.concordance_cc(x, y) - simple_cc(x, y))  # 1e-17

ignore = -1  # equivalent to deleting elems from x and y
y[10:400] = ignore
print(audmetric.concordance_cc(x[y != ignore], y[y != ignore]) - simple_cc(x, y))  # 1e-17

The text was updated successfully, but these errors were encountered:

hagenw · 2023-04-18T12:42:11Z

You are right the current implementation is slower:

import audmetric
import numpy as np
import time

np.random.seed(1)

samples = 10000000
repetitions = 100

def simple_cc(x, y, ignore=-100):
    '''concordance correlation coefficient'''
    mask = (y != ignore)
    N = mask.sum()
    mean_y = np.dot(mask, y) / N
    mean_x = np.dot(mask, x) / N
    a = mask * (x - mean_x)
    b = mask * (y - mean_y)
    numerator = 2 * np.dot(a, b)
    denominator = np.dot(a, a) + np.dot(b, b) + (mean_x - mean_y)**2 * N
    return numerator / denominator

x = np.random.randn(samples)
y = np.random.randn(samples)

start = time.time()
for n in range(repetitions):
    simple_cc(x, y)
end = time.time()
print(f'simple_cc: {(end - start) / repetitions:.2f} s')

start = time.time()
for n in range(repetitions):
    audmetric.concordance_cc(x, y)
end = time.time()
print(f'audmetric: {(end - start) / repetitions:.2f} s')

returns

simple_cc: 0.28 s
audmetric: 3.88 s

The problematic part are the lines in which we do

prediction = np.array(list(prediction))
truth = np.array(list(truth))

If we replace them by

if not isinstance(prediction, np.ndarray):
    prediction = np.array(list(prediction))
if not isinstance(truth, np.ndarray):
    truth = np.array(list(truth))

the execution time reduces to

audmetric: 0.32 s

When then considering a mask

x[200:20000] = -100
mask = (x != -100)

start = time.time()
for n in range(repetitions):
    simple_cc(x, y)
end = time.time()
print(f'simple_cc masked: {(end - start) / repetitions:.2f} s')

start = time.time()
for n in range(repetitions):
    audmetric.concordance_cc(x[mask], y[mask])
end = time.time()
print(f'audmetric masked: {(end - start) / repetitions:.2f} s')

we are getting

simple_cc masked: 0.26 s
audmetric masked: 0.54 s

where audmetric needs twice as long.

What you can do with audmetric to get the same speed is to use np.NaN instead of a mask:

x[200:20000] = np.NaN
y[200:20000] = np.NaN

start = time.time()
for n in range(repetitions):
    audmetric.concordance_cc(x, y)
end = time.time()
print(f'audmetric masked: {(end - start) / repetitions:.2f} s')

which runs in

audmetric masked: 0.22 s

Summary

We should definitely update the code to do the conversion to numpy array only if really needed
The ignore argument feels a little bit risky to me as in your example the original data could contain already values at -1, so maybe setting it to NaN might be the better choice instead of applying a mask

frankenjoe · 2023-04-18T12:44:36Z

The ignore argument feels a little bit risky to me as in your example the original data could contain already values at -1, so maybe setting it to NaN might be the better choice instead of applying a mask

Yes, that's what also bugs me. Since -1 might be a valid value, we should use NaN to mask. Then we can also remove the ignore argument.

hagenw · 2023-05-02T07:08:26Z

I was wrong in stating above that you can use NaN to mask values as the returned CCC is simply NaN in that case.

Which means audmetric.concordance_cc() can indeed be twice as fast when the masking is applied inside the function instead of outside. The question is if this feature is needed that often that we should consider adding a mask argument so you can pass on a mask or if we simply stay with the current approach. When adding a mask, we might also have to consider adding it to other functions of audmetric as a user expects that they behavior similar.

@dkounadis what is the use case for having a mask applied before calculating the CCC?

frankenjoe · 2023-05-02T07:18:05Z

The question is if this feature is needed that often that we should consider adding a mask argument so you can pass on a mask or if we simply stay with the current approach

I don't see why we need a mask argument. Can't we simply ignore entries where either truth or prediction is NaN when calculating the CCC? I also don't think we should introduce some complicated solution just because of speed - calculating CCC should not be a bottleneck.

hagenw · 2023-05-02T07:49:32Z

OK, it also seems a little bit risky by just ignoring NaN values instead of returning NaN as result in this case, but we could add an ignore_nan argument to audmetric.concordance_cc() and expand it also to other functions in order to tackle #14.

frankenjoe · 2023-05-02T07:58:42Z

Yes, that sounds like a good solution to me

dkounadis · 2023-05-02T09:45:54Z

True, a mask argument seems confusing.

hagenw mentioned this issue Apr 25, 2023

Speedup concordance_cc() and pearson_cc() #42

Merged

hagenw mentioned this issue May 3, 2023

Add ignore_nan argument to concordance_cc() #43

Merged

frankenjoe closed this as completed in #43 May 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Simpler CCC #41

Simpler CCC #41

dkounadis commented Apr 18, 2023 •

edited

Loading

hagenw commented Apr 18, 2023

frankenjoe commented Apr 18, 2023 •

edited

Loading

hagenw commented May 2, 2023

frankenjoe commented May 2, 2023

hagenw commented May 2, 2023

frankenjoe commented May 2, 2023

dkounadis commented May 2, 2023

Simpler CCC #41

Simpler CCC #41

Comments

dkounadis commented Apr 18, 2023 • edited Loading

hagenw commented Apr 18, 2023

frankenjoe commented Apr 18, 2023 • edited Loading

hagenw commented May 2, 2023

frankenjoe commented May 2, 2023

hagenw commented May 2, 2023

frankenjoe commented May 2, 2023

dkounadis commented May 2, 2023

dkounadis commented Apr 18, 2023 •

edited

Loading

frankenjoe commented Apr 18, 2023 •

edited

Loading