Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Usefulness of cur_covariance #611

Open
EricPgh opened this issue Aug 25, 2023 · 1 comment
Open

Usefulness of cur_covariance #611

EricPgh opened this issue Aug 25, 2023 · 1 comment

Comments

@EricPgh
Copy link

EricPgh commented Aug 25, 2023

I'm looking at the RMSE of a fitting solution to the training dataset and I see a few points that become outliers. Without sparsity, the solution should have a low error on training data with more error expected on the validation data. I assume this is mostly due to the sparsification process and creating a representation that isn't all the data, but a low error simplification. I was wondering if cur_covariance has a benefit to this over cur_points, but it seems really slow, first to form the covariance matrix and then to decompose. Is this sparsification method worth the effort? I see many posts using uniform methods, which I presume don't attempt to minimize the reconstruction error.
Thanks

Sorry, I'm not able to upload a picture, but this webpage depicts what I'm trying to describe.
https://www.researchgate.net/figure/Examples-of-various-outliers-found-in-regression-analysis-Case-1-is-an-outlier-with_fig2_50946372

@gabor1
Copy link
Contributor

gabor1 commented Aug 30, 2023

I'm not aware of anyone using it, mostly because as you say it is very slow. the "uniform" method would be very inefficient in high dimension, we use it for low dimensional descriptors (2-body and 3-body descriptors). I'm not sure what your data looks like, but people often split their data into different configuration types (e.g. solid, liquid, dimer, etc) and you can separately control how many sparse points are selected within each config type, so this is a way to ensure that an important config type with few configurations is not entirely missed by selecting sparse points from a config type with much more and diverse data (e.g. liquid).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants