Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kullback-Leibler divergence on multiple vectors produces a symmetric matrix #37

Open
kkondratieff opened this issue Dec 7, 2024 · 4 comments

Comments

@kkondratieff
Copy link

First, thank you for this fantastic package!

Second, when calculating KL divergence on multiple vectors using KL(), the resulting matrix is symmetric -- while the documentation correctly notes that KL divergences are non-symmetric. This appears to be an issue with the divergence calculation not being done in proper order for one of the triangulars, e.g. for a 3x3 calculation, [v1,v3] and [v3,v1] are both KL(v1||v3). Reversing the order of rows in the input matrix will result in the other set of KL divergences being appropriately calculated (but still in a symmetric matrix), so there's nothing wrong with the underlying algorithm, just how it handles matrix input.

I do not know if this is intended behavior -- I assume it might be, since KL is sensitive to order when you put in 2 input vectors and does not output KL divergences for both comparisons. It feels like if it is intended, it should be mentioned in the documentation; otherwise, the symmetric output is confusing in light of the nature of KL divergence since it feels like cell [v3,v1] should be KL(v3||v1), not KL(v1||v3).

@kkondratieff kkondratieff changed the title Kullback-Leiber divergence on multiple vectors produces a symmetric matrix Kullback-Leibler divergence on multiple vectors produces a symmetric matrix Dec 7, 2024
@jonnyschaefer
Copy link

It happens with other asymmetric distances aswell.
I was also confused I did not get an asymmetric matrix.

https://github.com/drostlab/philentropy/blob/master/R/DistMatrix_epsilon.R#L13
Always generates symmetric matrices.

@HajkD
Copy link
Member

HajkD commented Dec 29, 2024

Hi @kkondratieff and @jonnyschaefer

Thank you very much for making me aware of this confusing documentation.

I think you are absolutely right that this should be noted more clearly in the documentation.

Do you think dedicated functions or parameter settings would be useful to also allow users such as yourselves to run the asymmetric metrics or would this confuse even more? Any suggestions and feedback on your usability would be greatly appreciated.

With many thanks,
Hajk

@jonnyschaefer
Copy link

jonnyschaefer commented Dec 29, 2024

Hi,

what I expected when using assymetric distances was something like

df <- # my data frame with cols as variables
distMat <- philentropy::distance(t(df), method="some_asymmetric_distance")

that

distMat[i, j] == some_asymmetric_distance(df[,i], df[,j])

always holds.

In my view this the only way that would make sense, as this would be the same result as if I would do two (slow) for loops myself on some_asymmetric_distance.

With the current behavior I would not even know how to generate the other triangular matrix with the indices swapped.

@kkondratieff
Copy link
Author

My preference would be a parameter setting, and that its default is to calculate the asymmetric distance matrix when the chosen metric is asymmetric. Something like symmetric = FALSE.

Another option might be to have the existing functions default to returning only the upper triangular of the matrix for e.g. KL(x||y), so you would get the other triangular by calculating KL(y||x). (Which is technically the behavior now, I have noticed, but it's confusing because the whole matrix is filled out and implies symmetry for asymmetric metrics.)

In that case, perhaps you have a parameter called upper.tri and setting it FALSE has the function return KL(x||y) in the upper triangular and KL(y||x) in the lower.

In either case, I want a way to get the output that @jonnyschaefer is describing from the current functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants