Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a function to fill NaNs in grids by interpolation #440

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

Phssilva
Copy link

It has been added the function that fills NaN data in a grid and a test has been performed for this function.

Please review, and I'm available for further revisions.

Relevant issues/PRs:

Copy link

welcome bot commented Mar 19, 2024

💖 Thank you for opening your first pull request in this repository! 💖

A few things to keep in mind:

No matter what, we are really grateful that you put in the effort to do this!

verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
@leouieda leouieda changed the title issue #439 Add a function to fill NaNs in grids by interpolation Mar 21, 2024
Copy link
Member

@leouieda leouieda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Phssilva thank you for the implementation! I left some comments in the code with changes that are needed.

I really appreciate you taking the time to do this!

Comment on lines 34 to 43
def test_fill_nans():
"""
This function tests the fill_nans function.
"""

grid = np.array([[1, np.nan, 3],
[4, 5, np.nan],
[np.nan, 7, 8]])
filled_grid = fill_nans(grid)
assert np.isnan(filled_grid).sum() == 0
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The function is supposed to take an xarray.DataArray but the test gives it a numpy array. The test should match the expected use of the function. Please make the grid into a DataArray.

It would also be good to check if the final grid has the correct values in the NaNs. Right now, this only checks that the NaNs aren't there but the values could be completely wrong and we'd never know.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Leo!
A DataArray was added to the test function, and a check was made on the filled values in the DataArray. Could you please verify if the changes made are correct? Thank you for your attention.

verde/utils.py Outdated
@@ -14,6 +14,7 @@
import pandas as pd
import xarray as xr
from scipy.spatial import cKDTree
from sklearn.impute import KNNImputer
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use verde.KNeighbors instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used the KNeighbors class and used the predict and fit methods to fill in the values of the data array.

verde/utils.py Outdated

Parameters
----------
grid : :class:`xarray.Dataset` or :class:`xarray.DataArray`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be only a DataArray and not a Dataset.

verde/utils.py Outdated
for i, idx in enumerate(unknown_indices):
grid[tuple(idx)] = predicted_values[i]

return grid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output grid should be a copy of the input grid. We want to avoid changing the input values in-place. The code above will actually alter the input and could cause problems for users since their original grid with NaNs is now gone.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I used a copy of the grid in the variable filled_grid, which is returned at the end of the function.

Copy link
Member

@leouieda leouieda left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for doing the work @Phssilva! I left some more comments to get this on its way.

Comment on lines +44 to +47
expected_values = xr.DataArray([[1, 1, 3],
[4, 5, 3],
[4, 7, 8]])

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The DataArrays should contain coordinates as well. They should be as close as possible to the format of a real dataset. That's how we make the tests more robust.

verde/tests/test_utils.py Outdated Show resolved Hide resolved
verde/tests/test_utils.py Outdated Show resolved Hide resolved
verde/tests/test_utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
unknown_indices = np.argwhere(np.isnan(grid.values))

knn_imputer = vd.KNeighbors()
easting, northing = not_nan_values[:, 0], not_nan_values[:, 1]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the actual coordinates of the grid instead of generating indices. This makes the interpolation work even if the grid is not uniform.

verde/utils.py Outdated Show resolved Hide resolved
verde/utils.py Outdated Show resolved Hide resolved
Comment on lines +702 to +703
not_nan_values = np.argwhere(~np.isnan(grid.values))
unknown_indices = np.argwhere(np.isnan(grid.values))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the verde.grid_to_table function and then drop the NaNs from it. It's easier and preserves the coordinates of the grid.

Comment on lines +709 to +712
predicted_values = knn_imputer.predict((easting, northing))

for i, idx in enumerate(unknown_indices):
filled_grid[tuple(idx)] = predicted_values[i]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of this, you could use knn.grid and pass in the coordinates of the original grid.

Phssilva and others added 6 commits April 11, 2024 19:57
Co-authored-by: Leonardo Uieda <[email protected]>
Co-authored-by: Leonardo Uieda <[email protected]>
Co-authored-by: Leonardo Uieda <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants