Skip to content

Commit

Permalink
Improve SPI performance (#1311)
Browse files Browse the repository at this point in the history
<!--Please ensure the PR fulfills the following requirements! -->
<!-- If this is your first PR, make sure to add your details to the
AUTHORS.rst! -->
### Pull Request Checklist:
- [x] This PR addresses an already opened issue (for bug fixes /
features)
    - This PR fixes #1270 and fixes #1416 and fixes #1474
- [x] Tests for the changes have been added (for bug fixes / features)
- [x] (If applicable) Documentation has been added / updated (for bug
fixes / features)
- [x] HISTORY.rst has been updated (with summary of main changes)
- [x] Link to issue (:issue:`number`) and pull request (:pull:`number`)
has been added

### What kind of change does this PR introduce?

* Make SPI/SPEI faster
* fit params are now modular, can be computed before computing SPI/SPEI.
This allows more options to segment computations and allow to obtain the
fitting params if troubleshooting is needed.
* time indexing now possible
* `dist_method` now avoids `vectorize=True` in its `xr.apply_ufunc`.
This is the main improvement in SPI/SPEI.
* Better document the limits of usage of standardized indices. Now
standardized indices are capped at extreme values ±8.21. The upper bound
is a limit resulting of the use of float64.

### Does this PR introduce a breaking change?
Yes.

* `pr_cal` or `wb_cal` will not be input options in the future: 

> Inputing `pr_cal` will be deprecated in xclim==0.46.0. If `pr_cal` is
a subset of `pr`, then instead of:

`standardized_precipitation_index(pr=pr,pr_cal=pr.sel(time=slice(t0,t1)),...)`,
one can call:
`standardized_precipitation_index(pr=pr,cal_range=(t0,t1),...)`.
If for some reason `pr_cal` is not a subset of `pr`, then the following
approach will still be possible:
`params = standardized_index_fit_params(da=pr_cal, freq=freq,
window=window, dist=dist, method=method)`.
`spi = standardized_precipitation_index(pr=pr, params=params)`.
This approach can be used in both scenarios to break up the computations
in two, i.e. get params, then compute
standardized indice

I could revert this breaking change if we prefer. This was a first
attempt to make the computation faster, but the improvements are now
independent of this change. We could also keep the modular structure for
params, but revert to `pr_cal` instead of `cal_range`. It's a bit less
efficient when `pr_cal` is simply a subset of `pr`, because you end up
doing resampling/rolling two times on the calibration range for nothing.
When first computing `params`, then obtaining `spi` in two steps, then
it makes no difference
### Other information:
  • Loading branch information
coxipi authored Oct 23, 2023
2 parents 37a83bf + 59e7df6 commit dae1ffd
Show file tree
Hide file tree
Showing 4 changed files with 461 additions and 162 deletions.
6 changes: 6 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,11 @@ New features and enhancements
* Indicator ``generic.stats`` now accepts any frequency (previously only daily). (:pull:`1498`).
* Added argument ``out_units`` to ``select_resample_op`` to bypass limitations of ``to_agg_units`` in custom indicators. Add "var" to supported operations in ``to_agg_units``. (:pull:`1498`).
* `adapt_freq_thresh` argument added `sdba` training functions, allowing to perform frequency adaptation appropriately in each map block. (:pull:`1407`).
* Standardized indices (``xclim.indices.standardized_precipitation_index`` and ``xclim.indices.standardized_precipitation_evapotranspiration_index``) (:issue:`1270`, :issue:`1416`, :issue:`1474`, :pull:`1311`) were changed:
* Optimized and noticeably faster calculation.
* Can be computed in two steps: First compute fit parameters with ``xclim.indices.stats.standardized_index_fit_params``, then use the output in the standardized indices functions.
* The standardized index values are now clipped to ±8.21. This reflects the ``float64`` precision of the computation when cumulative distributed function values are inverted to a normal distribution and avoids returning infinite values.
* An offset parameter is now available to account for negative water balance values``xclim.indices.standardized_precipitation_evapotranspiration_index``.

Bug fixes
^^^^^^^^^
Expand All @@ -41,6 +46,7 @@ Breaking changes
* Default threshold in ``xclim.indices.snw_season_{start|length|end}`` changed form `20 kg m-2` to `4 kg m-2`. (:pull:`1505`).
* `xclim` development dependencies now include `ruff`. `pycodestyle` and `pydocstyle` have been replaced by `ruff` and removed from the `dev` installation recipe. (:pull:`1504`).
* The `mf_file` call signature found in ``xclim.ensembles.create_ensemble`` (and ``xclim.ensembles._ens_align_dataset``) has been removed (deprecated since `xclim` v0.43.0). (:pull:`1506`).
* ``xclim.indices.standardized_precipitation_index`` and ``xclim.indices.standardized_precipitation_evapotranspiration_index`` will no longer accept two datasets (data and calibration data). Instead, a single dataset covering both the calibration and evaluation periods is expected. (:issue:`1270`, :pull:`1311`).

Internal changes
^^^^^^^^^^^^^^^^
Expand Down
63 changes: 57 additions & 6 deletions tests/test_indices.py
Original file line number Diff line number Diff line change
Expand Up @@ -526,10 +526,10 @@ def test_standardized_precipitation_index(
ds = open_dataset("sdba/CanESM2_1950-2100.nc").isel(location=1)
pr = ds.pr.sel(time=slice("1998", "2000"))
pr_cal = ds.pr.sel(time=slice("1950", "1980"))
spi = xci.standardized_precipitation_index(
pr, pr_cal, freq, window, dist, method
params = xci.stats.standardized_index_fit_params(
pr_cal, freq=freq, window=window, dist=dist, method=method
)

spi = xci.standardized_precipitation_index(pr, params=params)
# Only a few moments before year 2000 are tested
spi = spi.isel(time=slice(-11, -1, 2))

Expand Down Expand Up @@ -610,11 +610,17 @@ def test_standardized_precipitation_evapotranspiration_index(
tas = tasmax - 2.5
tasmin = tasmax - 5
wb = xci.water_budget(pr, None, tasmin, tasmax, tas)
wb_cal = wb.sel(time=slice("1950", "1980"))
wb = wb.sel(time=slice("1998", "2000"))

params = xci.stats.standardized_index_fit_params(
wb.sel(time=slice("1950", "1980")),
freq=freq,
window=window,
dist=dist,
method=method,
offset="1 mm/d",
)
spei = xci.standardized_precipitation_evapotranspiration_index(
wb, wb_cal, freq, window, dist, method
wb.sel(time=slice("1998", "2000")), params=params
)

# Only a few moments before year 2000 are tested
Expand All @@ -625,6 +631,51 @@ def test_standardized_precipitation_evapotranspiration_index(

np.testing.assert_allclose(spei.values, values, rtol=0, atol=diff_tol)

def test_standardized_index_modularity(self, open_dataset):
freq, window, dist, method = "MS", 6, "gamma", "APP"
ds = (
open_dataset("sdba/CanESM2_1950-2100.nc")
.isel(location=1)
.sel(time=slice("1950", "2000"))
)
pr = ds.pr
# generate water budget
with xr.set_options(keep_attrs=True):
tasmax = ds.tasmax
tas = tasmax - 2.5
tasmin = tasmax - 5
wb = xci.water_budget(pr, None, tasmin, tasmax, tas)

params = xci.stats.standardized_index_fit_params(
wb.sel(time=slice("1950", "1980")),
freq=freq,
window=window,
dist=dist,
method=method,
offset="1 mm/d",
)
spei1 = xci.standardized_precipitation_evapotranspiration_index(
wb.sel(time=slice("1998", "2000")), params=params
)

spei2 = xci.standardized_precipitation_evapotranspiration_index(
wb,
freq=freq,
window=window,
dist=dist,
method=method,
offset="1 mm/d",
cal_start="1950",
cal_end="1980",
).sel(time=slice("1998", "2000"))

# In the previous computation, the first {window-1} values are NaN because the rolling is performed on the period [1998,2000].
# Here, the computation is performed on the period [1950,2000], *then* subsetted to [1998,2000], so it doesn't have NaNs
# for the first values
spei2[{"time": slice(0, window - 1)}] = np.nan

np.testing.assert_allclose(spei1.values, spei2.values, rtol=0, atol=1e-4)


class TestDailyFreezeThawCycles:
@pytest.mark.parametrize(
Expand Down
Loading

0 comments on commit dae1ffd

Please sign in to comment.