-
Notifications
You must be signed in to change notification settings - Fork 19
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Kullback-Leibler divergence on multiple vectors produces a symmetric matrix #37
Comments
It happens with other asymmetric distances aswell. https://github.com/drostlab/philentropy/blob/master/R/DistMatrix_epsilon.R#L13 |
Hi @kkondratieff and @jonnyschaefer Thank you very much for making me aware of this confusing documentation. I think you are absolutely right that this should be noted more clearly in the documentation. Do you think dedicated functions or parameter settings would be useful to also allow users such as yourselves to run the asymmetric metrics or would this confuse even more? Any suggestions and feedback on your usability would be greatly appreciated. With many thanks, |
Hi, what I expected when using assymetric distances was something like df <- # my data frame with cols as variables
distMat <- philentropy::distance(t(df), method="some_asymmetric_distance") that distMat[i, j] == some_asymmetric_distance(df[,i], df[,j]) always holds. In my view this the only way that would make sense, as this would be the same result as if I would do two (slow) for loops myself on With the current behavior I would not even know how to generate the other triangular matrix with the indices swapped. |
My preference would be a parameter setting, and that its default is to calculate the asymmetric distance matrix when the chosen metric is asymmetric. Something like Another option might be to have the existing functions default to returning only the upper triangular of the matrix for e.g. KL(x||y), so you would get the other triangular by calculating KL(y||x). (Which is technically the behavior now, I have noticed, but it's confusing because the whole matrix is filled out and implies symmetry for asymmetric metrics.) In that case, perhaps you have a parameter called In either case, I want a way to get the output that @jonnyschaefer is describing from the current functions. |
First, thank you for this fantastic package!
Second, when calculating KL divergence on multiple vectors using
KL()
, the resulting matrix is symmetric -- while the documentation correctly notes that KL divergences are non-symmetric. This appears to be an issue with the divergence calculation not being done in proper order for one of the triangulars, e.g. for a 3x3 calculation, [v1,v3] and [v3,v1] are both KL(v1||v3). Reversing the order of rows in the input matrix will result in the other set of KL divergences being appropriately calculated (but still in a symmetric matrix), so there's nothing wrong with the underlying algorithm, just how it handles matrix input.I do not know if this is intended behavior -- I assume it might be, since KL is sensitive to order when you put in 2 input vectors and does not output KL divergences for both comparisons. It feels like if it is intended, it should be mentioned in the documentation; otherwise, the symmetric output is confusing in light of the nature of KL divergence since it feels like cell [v3,v1] should be KL(v3||v1), not KL(v1||v3).
The text was updated successfully, but these errors were encountered: