Usefulness of cur_covariance #611

EricPgh · 2023-08-25T12:21:10Z

I'm looking at the RMSE of a fitting solution to the training dataset and I see a few points that become outliers. Without sparsity, the solution should have a low error on training data with more error expected on the validation data. I assume this is mostly due to the sparsification process and creating a representation that isn't all the data, but a low error simplification. I was wondering if cur_covariance has a benefit to this over cur_points, but it seems really slow, first to form the covariance matrix and then to decompose. Is this sparsification method worth the effort? I see many posts using uniform methods, which I presume don't attempt to minimize the reconstruction error.
Thanks

Sorry, I'm not able to upload a picture, but this webpage depicts what I'm trying to describe.
https://www.researchgate.net/figure/Examples-of-various-outliers-found-in-regression-analysis-Case-1-is-an-outlier-with_fig2_50946372

gabor1 · 2023-08-30T09:30:56Z

I'm not aware of anyone using it, mostly because as you say it is very slow. the "uniform" method would be very inefficient in high dimension, we use it for low dimensional descriptors (2-body and 3-body descriptors). I'm not sure what your data looks like, but people often split their data into different configuration types (e.g. solid, liquid, dimer, etc) and you can separately control how many sparse points are selected within each config type, so this is a way to ensure that an important config type with few configurations is not entirely missed by selecting sparse points from a config type with much more and diverse data (e.g. liquid).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Usefulness of cur_covariance #611

Usefulness of cur_covariance #611

EricPgh commented Aug 25, 2023

gabor1 commented Aug 30, 2023

Usefulness of cur_covariance #611

Usefulness of cur_covariance #611

Comments

EricPgh commented Aug 25, 2023

gabor1 commented Aug 30, 2023