-
Notifications
You must be signed in to change notification settings - Fork 17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
unlimited dimensions in NetCDF #53
Comments
In Julia generally only vectors can grow. Multidimensional arrays can't. And with a vector, I would argue that If we want to grow AbstractArrays we should define another method that can be used by users and overridden by other packages so the growing can be cleanly handled. |
I tend to agree in general, except for the first point. For example in Zarr.jl one can resize every array along any dimension, but this does not happen through For backwards compatibility I would make the fixed size behavior the default. |
Indeed, for "plain" DiskArray, It seems to me then that all packages would be able to retain backward compatibility. I think also that the compiler should be able to specialize for both cases. |
@meggart I didnt mean that we should never grow arrays, but thats its not the the standard API for The problem with using Adding runtime conditionals to check for arbitrary behavious is unusual in Julia, and something I would rather avoid. |
You already need to do that with current NetCDF/DiskArrays: using NetCDF
time_dim = NetCDF.NcDim("time",Int[],unlimited=true);
lon_dim = NetCDF.NcDim("lon",1:3);
v = NcVar("obs",[lon_dim,time_dim]);
NetCDF.create("newfile.nc",v) do nc
@show size(v)
nc["obs"][:,1:2] = rand(3,2)
@show size(v)
nc["obs"][:,1:3] = rand(3,3)
@show size(v)
@show supertype(typeof(v))
end; output:
|
I dont use NetCDF.jl and didn't realise that was possible. But its also pretty bad. The average julia user can't tell from reading your code that the size changes - you actually need to show the size like you have to make whats happening clear. Im arguing something like |
@Alexander-Barth (after a productive discussion) I just want to point you to the chunking code in DiskArrays.jl: This is an immutatble struct that has the size of the axis build into it ( If it was in |
Hey, @meggart thanks for the nice presentation today :-) One use case that I have, is that the NetCDF file is growing while a numerically model is running. The runs can take hours (or days) and it is good to check the results while the model is running. If we cache the size of an array, on the julia side it might get outdated. Even if there is a special API to detect a growing file on the julia side, there will still be the issue that the file is growing because a different process writes to it. In NCDatasets, I overload |
@rafaqz The other day, you asked me how to query if a dimension is unlimited in NCDatasets. Here is basically how you can do that: using NCDatasets
ds = NCDataset("/tmp/test3.nc","c")
defDim(ds,"lon",10)
defDim(ds,"lat",10)
defDim(ds,"time",Inf)
unlimited(ds.dim)
# returns ["time"] The function |
Thanks. Although I think it's not usefull for Rasters.jl in the end because of the chunking problem. We need a syntax like this that rebuilds everything (which has no real performance cost): newraster = resize!(raster, i, j, k) Then there is no problem with any of these packages and very little code has to change. We just handle the rebuild in that method. |
As I see the current state, I would like to have @meggart opinion, whether the currently implemented support of unlimited dimension is something that is intended to stay in DiskArrays (or do we don't know yet). I really would like to have this cached array functionality of DiskArray in NCDatasets (even if I would need to make a breaking change for this, but still have the DiskArray's current level of support of unlimited dimensions) and reuse the code developed in DiskArray for this. |
This was never followed up, but to clarify DiskArrays has never had support for growing arrays, it just didn't actively stop you (and still doesn't) But the chunking has always been broken after the array grows. |
A summary of the problems in
But we really need to resolve this problem for the ecosystem to move forwards, there is a lot of work hanging on this now. So, I've been thinking about other ways to instead make NCDatasets.jl
This will break if there are outer wrappers (like @meggart can you see any problems with that? and @Alexander-Barth would that work for you? I'm happy to implement |
I don't know, this all sounds quite complicated. There is currently no special case for growing a NetCDF variable in NCDatasets (and presumable also not in NetCDF.jl) as all is handled transparently by the C library. I am not familiar with DiskArrays, but your proposal sound like NCDatasets would need to modify internal (non-documented) fields of DiskArrays Do you need other functionality besides the iterator Maybe a different approach would be to have an abstract type with these methods associated to it defined in a minimal module. For I am also wondering if I have already some prototype code for this for NCDatasets. The advantage I can see, is that I am the only one who has to deal with resizable arrays (and I am not forcing this complexity to anybody else). |
This is maybe an order of magnitude less complicated than supporting this generically in DiskArrays! Hopefully you can understand that from our perspective too. If you are happy to have broken chunks after To help we can write a method in DiskArrays.jl to rebuild the chunks with the new indices, so you will just have to update the chunks field of your variable with something like: v.chunks = DiskArrays.resize_chunks(v.chunks, I...) Which could return a new This method is mostly used as a fallback as the chunks are precomputed: eachchunk(array,maximum_memory_for_each_chunk_in_bytes) But it that case its a good idea, maybe make an issue for tracking it?
I'm not totally sure oin what context you mean here. But I think all that needs to be updated in your
|
As a follow up of the discussion (Alexander-Barth/NCDatasets.jl#79), I would like to know if unlimited dimensions (i.e. growable) could be supported by DiskArrays. I hope you it is OK for you that I make a separate issue here. It is easier for me to track what is already done and what still needs to be discussed.
The last proposition by Fabian has been:
I think this would be a good idea.
One could also annotate the type
DiskArray
with an additional parameter whether its size is fixed or not:It is decided during creation of a DiskArray if the array can grow or not and all subsequent calls can use
A[index] = value
rather thansetindex(A,value,index,checkbounds=true/false)
. The compile should be able to specialize for fixed and growable arrays (there could even be two definitions ofsetindex!
for the two cases).CC @rafaqz, @Balinus, @visr
The text was updated successfully, but these errors were encountered: