You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As per #74, you may wish to write a subset of a NetCDF file to a TileDB array rather than the whole file. Possible use-cases:
the NetCDF file is too large to fit into system memory as a whole
you only wish to store a subset of the NetCDF file's data variable in a TileDB array.
One way this could be achieved is by optionally indexing the NetCDF data variable with the write indices for writing the NetCDF data to the TileDB array in write_array. For example:
I expect there will be 🐉 with keeping track of the indices, particularly ensuring that the NetCDF coordinate variables are indexed in line with the indices used for the NetCDF data variable.
The text was updated successfully, but these errors were encountered:
Thanks for your suggesttion.
Indeed i already tried it kind of this way. I used xarray to read the netCDF file, then i get the data values for variable 'tas' something like:
data = xr_ds['tas'][0:100].to_numpy()
data = data.flatten()
In my case i have dimensions time, x, y. here i get the first 100 entries for time = 0;100 for all x and y (climate data).
For the dimension values i use np.tile() and np.flatten() to generate a huge 2D array with all the coordinates the data values should get written to. Finally i wirte it to tiledb using this line:
with tiledb.open(tdb_name, mode = 'w') as write_array:
write_array[tuple(dim_values)] = {'tas': data}
In my case when i use most of my memory the shapes of dim_values and data are (18272925, 3) and (18272925,)
The writing process alone takes roughly 50s. This is roughly 1/500 of the whole data i want to write to tiledb.
I also saw in your code for TileDBWriter that you are using the same line for the writing, do you think that there is a change this can be speed up or done in a faster manner? Otherwise adding the chunking to tiledb_netcdf wouldn't speed up the writing for met that much i think.
As per #74, you may wish to write a subset of a NetCDF file to a TileDB array rather than the whole file. Possible use-cases:
One way this could be achieved is by optionally indexing the NetCDF data variable with the write indices for writing the NetCDF data to the TileDB array in
write_array
. For example:I expect there will be 🐉 with keeping track of the indices, particularly ensuring that the NetCDF coordinate variables are indexed in line with the indices used for the NetCDF data variable.
The text was updated successfully, but these errors were encountered: