Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUG: AssertionError when using rechunker #141

Closed
pagecp opened this issue Mar 17, 2022 · 2 comments · Fixed by #143
Closed

BUG: AssertionError when using rechunker #141

pagecp opened this issue Mar 17, 2022 · 2 comments · Fixed by #143
Assignees
Labels

Comments

@pagecp
Copy link
Collaborator

pagecp commented Mar 17, 2022

  • python: 3.9.7 | packaged by conda-forge | (default, Sep 23 2021, 07:28:37) [GCC 9.4.0]
  • icclim: 5.0.2 commit 97f2680
  • numpy: 1.21.2
  • pandas: 1.3.3
  • xclim: 0.34.0
  • dask: 2022.02.1
  • xarray: 0.19.0
  • cftime: 1.5.1

Description

AssertionError when using rechunker.

Minimal reproducible example

import icclim

import sys
import glob
import os
import datetime
import cftime

import numpy as np
import pandas as pd
import xarray as xr
import dask

import xclim
from distributed import Client
import logging

client = Client(memory_limit='16GB', n_workers=1, threads_per_worker=2, silence_logs=logging.WARNING)

dask.config.set({"array.slicing.split_large_chunks": False})
dask.config.set({"distributed.worker.memory.target": "0.8"})
dask.config.set({"distributed.worker.memory.spill": "0.9"})
dask.config.set({"distributed.worker.memory.pause": "0.95"})
dask.config.set({"distributed.worker.memory.terminate": "0.98"})

dask.config.set({"array.chunk-size": "500 MB"})

dt1 = datetime.datetime(2001,1,1)
dt2 = datetime.datetime(2010,12,31)

dt1r = datetime.datetime(1981,1,1)
dt2r = datetime.datetime(2000,12,31)

out_f = 'tx90p_icclim.nc'
filenames = glob.glob('./data/latest/tasmax_1d*ERA5.nc')

with icclim.create_optimized_zarr_store(
        in_files=filenames,
        var_names="tasmax",
        target_zarr_store_name="page_tasmax_day.zarr",
        dim="time",
        keep_target_store=True
) as tasmax_ds:
  icclim.index(index_name='TX90p', in_files=tasmax_ds, var_name='tasmax', slice_mode='JJA', base_period_time_range=[dt1r, dt2r], time_range=[dt1, dt2], out_unit='%', out_file=out_f, logs_verbosity='HIGH')

Output received

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
/tmp/ipykernel_396831/531130476.py in <module>
     14 from icclim import create_optimized_zarr_store
     15 
---> 16 with create_optimized_zarr_store(
     17         in_files=filenames,
     18         var_names="tasmax",

/data/softs/anaconda3-2020.07/envs/gloenv_py3.9/lib/python3.9/contextlib.py in __enter__(self)
    117         del self.args, self.kwds, self.func
    118         try:
--> 119             return next(self.gen)
    120         except StopIteration:
    121             raise RuntimeError("generator didn't yield") from None

/data/scratch/globc/page/envs/icclimv5/lib/python3.9/site-packages/icclim/pre_processing/rechunk.py in create_optimized_zarr_store(in_files, var_names, target_zarr_store_name, dim, keep_target_store)
    105         shutil.rmtree(TMP_STORE_2, ignore_errors=True)
    106         shutil.rmtree(target_zarr_store_name, ignore_errors=True)
--> 107         yield _unsafe_create_optimized_zarr_store(
    108             in_files, var_names, target_zarr_store_name, dim, _get_mem_limit()
    109         )

/data/scratch/globc/page/envs/icclimv5/lib/python3.9/site-packages/icclim/pre_processing/rechunk.py in _unsafe_create_optimized_zarr_store(in_files, var_names, zarr_store_name, dim, max_mem)
    150             ds_zarr[c].encoding = {}
    151             target_chunks.update({c: None})
--> 152         rechunk(
    153             source=ds_zarr,
    154             target_chunks=target_chunks,

/data/scratch/globc/page/envs/icclimv5/lib/python3.9/site-packages/rechunker/api.py in rechunk(source, target_chunks, max_mem, target_store, target_options, temp_store, temp_options, executor)
    303         temp_options=temp_options,
    304     )
--> 305     plan = executor.prepare_plan(copy_spec)
    306     return Rechunked(executor, plan, source, intermediate, target)
    307 

/data/scratch/globc/page/envs/icclimv5/lib/python3.9/site-packages/rechunker/executors/dask.py in prepare_plan(self, specs)
     19 
     20     def prepare_plan(self, specs: Iterable[CopySpec]) -> Delayed:
---> 21         return _copy_all(specs)
     22 
     23     def execute_plan(self, plan: Delayed, **kwargs):

/data/scratch/globc/page/envs/icclimv5/lib/python3.9/site-packages/rechunker/executors/dask.py in _copy_all(specs)
     94 def _copy_all(specs: Iterable[CopySpec],) -> Delayed:
     95 
---> 96     stores_delayed = [_chunked_array_copy(spec) for spec in specs]
     97 
     98     if len(stores_delayed) == 1:

/data/scratch/globc/page/envs/icclimv5/lib/python3.9/site-packages/rechunker/executors/dask.py in <listcomp>(.0)
     94 def _copy_all(specs: Iterable[CopySpec],) -> Delayed:
     95 
---> 96     stores_delayed = [_chunked_array_copy(spec) for spec in specs]
     97 
     98     if len(stores_delayed) == 1:

/data/scratch/globc/page/envs/icclimv5/lib/python3.9/site-packages/rechunker/executors/dask.py in _chunked_array_copy(spec)
     72                 if key.startswith("from-zarr"):
     73                     root_keys.append(key)
---> 74         assert len(root_keys) == 1
     75         root_key = root_keys[0]
     76 

AssertionError: 
@pagecp pagecp added the bug label Mar 17, 2022
@bzah
Copy link
Member

bzah commented Mar 17, 2022

I think it is due to pangeo-data/rechunker#110.

Basically we have to pin rechunker to 0.3.3 because of pangeo-data/rechunker#92 but it doesn't work with latest dask version, 2022.02.1 in your case.

The only fixes for now are

  • To downgrade your dask to 2021.10.0 for example (that's the version I locally use)
  • To pin dask dependency in icclim to the latest compatible version

@pagecp
Copy link
Collaborator Author

pagecp commented Mar 17, 2022

Had to downgrade both dask and distributed to 2021.10.0.
Final results in some time, but did not crash this time.

@bzah bzah linked a pull request Mar 22, 2022 that will close this issue
1 task
@bzah bzah closed this as completed in #143 Mar 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants