-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
classify_variables() fails to detect 'tasmax' data var? #33
Comments
Hi @CloudNiner - thanks for getting in touch! Apologies for a very slow reply, shifting work pressures and then a global pandemic have rather limited the time I've had to spend working on this library... Looks like you've successfully found the root cause of the issue you hit. Unfortunately due to the vagaries of the NetCDF spec it's very hard to find a fully-accurate test for "is this variable a data variable?" and the one I settled on is "this variable has a I'm not sure what I can do to change the classification code to help out with this. I can't say "multiple values in the I'm surprised that manually setting the data variables then calling >>> print(data_model.domains)
('time', 'lat', 'lon')
>>> print(data_model.domain_varname_mapping)
{('time', 'lat', 'lon'): 'tasmax'} Note the second one might actually have If you don't get this then there's still something awry in the metadata classification, which will be the next thing to run the magnifying glass over... |
I just had another thought about this - when you ran the classification on this file did you get a note that You could try this (assuming it has classified incorrectly as an aux coordinate): input_netcdf_file = "path/to/file/on/disk/loca.nc"
data_var_name = 'tasmax'
data_model = NCDataModel(input_netcdf_file)
data_model.classify_variables()
data_model.data_var_names = [data_var_name]
data_model.aux_coord_names.pop(data_model.aux_coord_names.index(data_var_name))
data_model.get_metadata() |
Thanks for the detailed responses and no worries about the delay! First I updated to the latest master, ae98ce2 It looks like
Before the writer is created, the data model looks like this
Unfortunately the writer crashes with this error:
I tossed some debug output in there and it looks like that crash happens when I attempt to write the following key + value:
Weird because that's a number not a tuple...in fact, its a
This appears to be an issue with the underlying tiledbpy library -- I'd expect many netcdf files to have attributes that are read as numpy types. If I update https://github.com/informatics-lab/tiledb_netcdf/blob/master/nctotdb/writers.py#L543 to
Looks like its looking for the So, again it looks like we're back to the trappings of the wonderful self describing NetCDF file format in that a thing you're looking for doesn't exist in this particular file. If its not necessary that the data var have a grid mapping attr, perhaps the exception can just be caught and writing can continue, but I'm not familiar enough with the tiledb format to make that call quite yet. Thoughts on that one? If it seems to you that the metadata writing issue above is a problem with the underlying library, I'll open a separate issue over there since I don't see any mention of a similar issue fixed in their changelog between 0.5.9 and 0.62. In any case, I saw you updated the README with some notes about how to manually reclassify and use the TileDBWriter, that was super helpful after revisiting this some time later and getting back up to speed. Nice docs! |
Hi -- found your library and am looking to use it to expedite writing some netcdf files to tiledb arrays for testing.
I'm using the LOCA climate dataset as my test and started with this single file: s3://nasanex/LOCA/GFDL-ESM2G/16th/rcp85/r1i1p1/tasmax/tasmax_day_GFDL-ESM2G_rcp85_r1i1p1_20500101-20501231.LOCA_2016-04-02.16th.nc
When I load that file with this code:
the program crashes with the error:
For this file, I would expect the data var to be
tasmax
which looks like this:tasmax
doesn't have a coordinates attr as read bynetCDF4
lib so it fails the check here: https://github.com/informatics-lab/tiledb_netcdf/blob/master/nctotdb/data_model.py#L64I attempted to hack my way past this, but that leads to another downstream error:
I've installed all the necessary libraries with conda, I'm currently using tiledb 1.7.7, tiledb-py 0.5.9, iris 2.4.0 on MacOS 10.15.4
The text was updated successfully, but these errors were encountered: