-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simpler CCC #41
Comments
You are right the current implementation is slower: import audmetric
import numpy as np
import time
np.random.seed(1)
samples = 10000000
repetitions = 100
def simple_cc(x, y, ignore=-100):
'''concordance correlation coefficient'''
mask = (y != ignore)
N = mask.sum()
mean_y = np.dot(mask, y) / N
mean_x = np.dot(mask, x) / N
a = mask * (x - mean_x)
b = mask * (y - mean_y)
numerator = 2 * np.dot(a, b)
denominator = np.dot(a, a) + np.dot(b, b) + (mean_x - mean_y)**2 * N
return numerator / denominator
x = np.random.randn(samples)
y = np.random.randn(samples)
start = time.time()
for n in range(repetitions):
simple_cc(x, y)
end = time.time()
print(f'simple_cc: {(end - start) / repetitions:.2f} s')
start = time.time()
for n in range(repetitions):
audmetric.concordance_cc(x, y)
end = time.time()
print(f'audmetric: {(end - start) / repetitions:.2f} s') returns
The problematic part are the lines in which we do prediction = np.array(list(prediction))
truth = np.array(list(truth)) If we replace them by if not isinstance(prediction, np.ndarray):
prediction = np.array(list(prediction))
if not isinstance(truth, np.ndarray):
truth = np.array(list(truth)) the execution time reduces to
When then considering a mask x[200:20000] = -100
mask = (x != -100)
start = time.time()
for n in range(repetitions):
simple_cc(x, y)
end = time.time()
print(f'simple_cc masked: {(end - start) / repetitions:.2f} s')
start = time.time()
for n in range(repetitions):
audmetric.concordance_cc(x[mask], y[mask])
end = time.time()
print(f'audmetric masked: {(end - start) / repetitions:.2f} s') we are getting
where What you can do with x[200:20000] = np.NaN
y[200:20000] = np.NaN
start = time.time()
for n in range(repetitions):
audmetric.concordance_cc(x, y)
end = time.time()
print(f'audmetric masked: {(end - start) / repetitions:.2f} s') which runs in
Summary
|
Yes, that's what also bugs me. Since -1 might be a valid value, we should use |
I was wrong in stating above that you can use Which means @dkounadis what is the use case for having a mask applied before calculating the CCC? |
I don't see why we need a mask argument. Can't we simply ignore entries where either |
OK, it also seems a little bit risky by just ignoring |
Yes, that sounds like a good solution to me |
True, a mask argument seems confusing. |
I have been interested in a fast implementation of CCC where one can ignore values from
x, y
without deleting elements or modifying the dimension ofx, y
.The text was updated successfully, but these errors were encountered: