-
I have an input xarray dataset I want to regrid <xarray.Dataset>
Dimensions: (longitude: 1440, latitude: 721, time: 132)
Coordinates:
* longitude (longitude) float32 0.0 0.25 0.5 0.75 ... 359.0 359.2 359.5 359.8
* latitude (latitude) float32 90.0 89.75 89.5 89.25 ... -89.5 -89.75 -90.0
* time (time) datetime64[ns] 2000-01-01 2000-02-01 ... 2010-12-01
Data variables:
d2m (time, latitude, longitude) float32 dask.array<chunksize=(12, 721, 1440), meta=np.ndarray> First I create the grid and then the regridder: ds_out = xe.util.grid_global(2.5, 2.5)
regridder = xe.Regridder(
ds,
ds_out,
"bilinear",
periodic=True,
parallel=True,
) If I have parallel set to true, I get an error message: ---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
File <timed exec>:1
File ~/python_file.py:41, in regrid_to_lat_lon(ds, target_res_lat, target_res_lon)
23 """
24 Regrid any grid to a res_lat x res_lon degree lat lon grid.
25
(...)
38 Regridded dataset.
39 """
40 ds_out = xe.util.grid_global(target_res_lat, target_res_lon)
---> 41 regridder = xe.Regridder(
42 ds,
43 ds_out,
44 "bilinear",
45 periodic=True,
46 parallel=True,
47 )
48 dr_out = regridder(ds)
File /[...]/python3.9/site-packages/xesmf/frontend.py:955, in Regridder.__init__(self, ds_in, ds_out, method, locstream_in, locstream_out, periodic, parallel, **kwargs)
952 self.out_coords = {lat_out.name: lat_out, lon_out.name: lon_out}
954 if parallel:
--> 955 self._init_para_regrid(ds_in, ds_out, kwargs)
File /[...]/python3.9/site-packages/xesmf/frontend.py:973, in Regridder._init_para_regrid(self, ds_in, ds_out, kwargs)
971 ds_out['mask'] = mask
972 else:
--> 973 ds_out_chunks = tuple([ds_out.chunksizes[i] for i in self.out_horiz_dims])
974 ds_out = ds_out.coords.to_dataset()
975 mask = da.ones(self.shape_out, dtype=bool, chunks=ds_out_chunks)
File /[...]/python3.9/site-packages/xesmf/frontend.py:973, in <listcomp>(.0)
971 ds_out['mask'] = mask
972 else:
--> 973 ds_out_chunks = tuple([ds_out.chunksizes[i] for i in self.out_horiz_dims])
974 ds_out = ds_out.coords.to_dataset()
975 mask = da.ones(self.shape_out, dtype=bool, chunks=ds_out_chunks)
File /[...]/python3.9/site-packages/xarray/core/utils.py:455, in Frozen.__getitem__(self, key)
454 def __getitem__(self, key: K) -> V:
--> 455 return self.mapping[key]
KeyError: 'y' When I set parallel to false, everything works and my regridder looks like this
I am using Python 9.9.18, Xarray 2023.8.0, and xesmf 0.8.1. |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments
-
Thanks for the detailed report. This is a brand new feature, we'll look into this and release a fix. |
Beta Was this translation helpful? Give feedback.
-
This is a confusing caveat of As the documentation of
An easy way out is:
Where you choose NNN and MMM according to your situation. I'll open a PR that adds a meaningful error. |
Beta Was this translation helpful? Give feedback.
-
I am not sure if raising an error message is an appropriate solution for the usage of But I followed your approach, which works perfectly. ds_out = xe.util.grid_global(target_res_lat, target_res_lon, cf=True)
ds_out = ds_out.assign(mask=ds_out.lat.notnull() & ds_out.lon.notnull()).chunk() Are there recommended chunk sizes for the Three further additions:
|
Beta Was this translation helpful? Give feedback.
-
Well you are right, but I don't think SpeedI did import xarray as xr
import xesmf as xe
ds = xr.tutorial.open_dataset('air_temperature')
ds_out = xe.util.grid_global(2.5, 2.5)
ds_out = ds_out.assign(mask = ds_out.lon.notnull() & ds_out.lat.notnull())
# Chunked, parallel
reg = xe.Regridder(ds, ds_out.chunk(), 'bilinear', periodic=True, parallel=True)
# took 2.94 s
# Chunked, not parallel
reg = xe.Regridder(ds, ds_out.chunk(), 'bilinear', periodic=True, parallel=False)
# took 476 ms
# Not chunked, not parallel
reg = xe.Regridder(ds, ds_out, 'bilinear', periodic=True, parallel=False)
# took 241 ms I don't have the same input As a rule of thumb, I would say
|
Beta Was this translation helpful? Give feedback.
-
Thank you for your explanation! I have to admit, I do not know a lot about how to optimize my functions using chunks yet. And until this loading, everything runs using dask arrays in the background and is hopefully optimized for parallel computation. (That at least is what I was hoping to accomplish by not loading my datasets before saving the final result again to a netcdf file.) All the xarray functions just work with dask and without dask, and there is no need to think about the RAM and hardware specs or to add custom arguments for either case. Because I do not really know what is the best option, I was hoping that I will do further speed testing with my input datset. |
Beta Was this translation helpful? Give feedback.
-
Just to clear some things up, reg = xe.Regridder(...) # Generating the regridding function, the regridding weights
dsout = reg(ds_in) # Applying the function Up until xESMF 0.7.1, we had two limitations:
In xESMF 0.8, both have been fixed, with the limitations on the second one that I highlighted above. Thus, unless you have a very very large output grid (you are performing upscaling), xESMF 0.8 should feel as plug-and-play as the other xarray tools. Going from 0.25° to 5°, you should not need |
Beta Was this translation helpful? Give feedback.
-
That actually cleared up a lot of things for me! I am sorry, for not reading the available information on https://xesmf.readthedocs.io/ before starting this issue / discussion. I thought, |
Beta Was this translation helpful? Give feedback.
-
I have done some quick speed testing for my function.
There is lots of other stuff going on in this function, but 5 seconds of saved time is pretty substantial anyway! |
Beta Was this translation helpful? Give feedback.
This is a confusing caveat of
parallel=True
and we should raise a better error message.As the documentation of
parallel
says (empasis mine):ds_out
, the output ofgrid_global
has no variable and thus no chunks (the error is that the dimension name isn't a key of thechunksizes
dictionary). Therefore, xESMF doesn't know how to parallelize the weights generation.An easy way out is:
Where you choose NNN and MMM according to your situation.
I'll open a PR that adds a meaningful error.