Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cube drops variables #202

Closed
gdkrmr opened this issue Dec 6, 2022 · 12 comments
Closed

Cube drops variables #202

gdkrmr opened this issue Dec 6, 2022 · 12 comments

Comments

@gdkrmr
Copy link
Contributor

gdkrmr commented Dec 6, 2022

I have come across this issue several times now, Cube drops some variables.

julia> s3path = "http://data.rsc4earth.de:9000/earthsystemdatacube/v3.0.1/esdc-8d-0.25deg-256x128x128-3.0.1.zarr";
                                                                                                                  
julia> c3 = Cube(s3path);                                                                                         
                                                                                                                  
julia> z3 = Zarr.zopen(s3path, consolidated=true, fill_as_missing=false);                                         
                                                                                                                  
julia> symdiff(c3.axes[4].values, string.(keys(z3.arrays)))                                                       
9-element Vector{String}:                                                                                         
 "sensible_heat"                                                                                                  
 "latent_energy"                                                                                                  
 "time"                                                                                                           
 "terrestrial_ecosystem_respiration"                                                                              
 "lon"                                                                                                            
 "net_radiation"                                                                                                  
 "lat"                                                                                                            
 "burnt_area"                                                                                                     
 "net_ecosystem_exchange"         

Moved over from here:
JuliaDataCubes/EarthDataLab.jl#292

@lazarusA
Copy link
Collaborator

lazarusA commented Dec 7, 2022

Yes, this also something that I observe from time to time. Related issue #47

@lazarusA
Copy link
Collaborator

lazarusA commented Dec 7, 2022

In your example Cube only keeps the variables with the same dimensions, which makes sense, @meggart ?. The others are discarded. The way to open this file is via open_dataset, as in

g = open_dataset(zopen(s3path, consolidated=true, fill_as_missing=false))

and this one contains all the information.

@gdkrmr
Copy link
Contributor Author

gdkrmr commented Dec 7, 2022

The issue is a change in eltype because some some of the datasets have an offset and scale factor and get wrapped into a DiskArrayTools.CFDiskArray which changes the eltype from Float32 to Float64j. Details in meggart/DiskArrayTools.jl#15 and meggart/DiskArrayTools.jl#16.

@gdkrmr
Copy link
Contributor Author

gdkrmr commented Dec 7, 2022

I have just checked and Cube is not fixed yet.

@gdkrmr
Copy link
Contributor Author

gdkrmr commented Dec 7, 2022

fixed now ;-)

@lazarusA
Copy link
Collaborator

lazarusA commented Dec 7, 2022

for your cube I still get the 9 difference: [lon, lat, time are axis, so, those should not count]

using DiskArrayTools, YAXArrays, Zarr
s3path = "http://data.rsc4earth.de:9000/earthsystemdatacube/v3.0.1/esdc-8d-0.25deg-256x128x128-3.0.1.zarr"
c3 = Cube(s3path);                                                                                                                                                                                                           
z3 = Zarr.zopen(s3path, consolidated=true, fill_as_missing=false); 
symdiff(c3.axes[4].values, string.(keys(z3.arrays))) 
9-element Vector{String}:
 "sensible_heat"
 "latent_energy"
 "time"
 "terrestrial_ecosystem_respiration"
 "lon"
 "net_radiation"
 "lat"
 "burnt_area"
 "net_ecosystem_exchange"

with these versions:

(tmp) pkg> st
Status `~/Documents/tmp/Project.toml`
  [fcd2136c] DiskArrayTools v0.1.6 `https://github.com/gdkrmr/DiskArrayTools.jl.git#offsetpromotion`
  [c21b50f5] YAXArrays v0.4.3 `https://github.com/JuliaDataCubes/YAXArrays.jl.git#master`
  [0a941bbe] Zarr v0.8.0

(tmp) pkg> 

@gdkrmr
Copy link
Contributor Author

gdkrmr commented Dec 8, 2022

you are right, seems like I still need to fix that. It works when using fill_as_missing = true.

@gdkrmr
Copy link
Contributor Author

gdkrmr commented Dec 8, 2022

I figured out the issue: a "_FillValue" becomes the default missing value for CFDiskArray and adds Missing to its eltype. I have added a commit but still need to test it.

@lazarusA
Copy link
Collaborator

lazarusA commented Dec 8, 2022

I have added a commit but still need to test it.

Indeed. For your use case burnt_area is still missing.

4-element Vector{String}:
 "time"
 "lon"
 "lat"
 "burnt_area"

@gdkrmr
Copy link
Contributor Author

gdkrmr commented Dec 8, 2022

thanks for testing. burnt_area is Float64, this is as bug in the DataCube.

@felixcremer
Copy link
Member

Is this fixed by your PR in DiskArrayTools meggart/DiskArrayTools.jl#16?

@gdkrmr
Copy link
Contributor Author

gdkrmr commented Dec 14, 2022

Yes, it should be

@gdkrmr gdkrmr closed this as completed Dec 14, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants