Improve SPI performance (#1311)

### Pull Request Checklist: - [x] This PR addresses an already opened issue (for bug fixes / features) - This PR fixes #1270 and fixes #1416 and fixes #1474 - [x] Tests for the changes have been added (for bug fixes / features) - [x] (If applicable) Documentation has been added / updated (for bug fixes / features) - [x] HISTORY.rst has been updated (with summary of main changes) - [x] Link to issue (:issue:`number`) and pull request (:pull:`number`) has been added ### What kind of change does this PR introduce? * Make SPI/SPEI faster * fit params are now modular, can be computed before computing SPI/SPEI. This allows more options to segment computations and allow to obtain the fitting params if troubleshooting is needed. * time indexing now possible * `dist_method` now avoids `vectorize=True` in its `xr.apply_ufunc`. This is the main improvement in SPI/SPEI. * Better document the limits of usage of standardized indices. Now standardized indices are capped at extreme values ±8.21. The upper bound is a limit resulting of the use of float64. ### Does this PR introduce a breaking change? Yes. * `pr_cal` or `wb_cal` will not be input options in the future: > Inputing `pr_cal` will be deprecated in xclim==0.46.0. If `pr_cal` is a subset of `pr`, then instead of: `standardized_precipitation_index(pr=pr,pr_cal=pr.sel(time=slice(t0,t1)),...)`, one can call: `standardized_precipitation_index(pr=pr,cal_range=(t0,t1),...)`. If for some reason `pr_cal` is not a subset of `pr`, then the following approach will still be possible: `params = standardized_index_fit_params(da=pr_cal, freq=freq, window=window, dist=dist, method=method)`. `spi = standardized_precipitation_index(pr=pr, params=params)`. This approach can be used in both scenarios to break up the computations in two, i.e. get params, then compute standardized indice I could revert this breaking change if we prefer. This was a first attempt to make the computation faster, but the improvements are now independent of this change. We could also keep the modular structure for params, but revert to `pr_cal` instead of `cal_range`. It's a bit less efficient when `pr_cal` is simply a subset of `pr`, because you end up doing resampling/rolling two times on the calibration range for nothing. When first computing `params`, then obtaining `spi` in two steps, then it makes no difference ### Other information:
Ouranosinc · Oct 23, 2023 · dae1ffd · dae1ffd
2 parents 37a83bf + 59e7df6
commit dae1ffd
Show file tree

Hide file tree

Showing 4 changed files with 461 additions and 162 deletions.
diff --git a/CHANGES.rst b/CHANGES.rst
@@ -20,6 +20,11 @@ New features and enhancements
 * Indicator ``generic.stats`` now accepts any frequency (previously only daily). (:pull:`1498`).
 * Added argument ``out_units`` to ``select_resample_op`` to bypass limitations of ``to_agg_units`` in custom indicators. Add "var" to supported operations in ``to_agg_units``. (:pull:`1498`).
 * `adapt_freq_thresh` argument added `sdba` training functions, allowing to perform frequency adaptation appropriately in each map block. (:pull:`1407`).
+* Standardized indices (``xclim.indices.standardized_precipitation_index`` and ``xclim.indices.standardized_precipitation_evapotranspiration_index``)  (:issue:`1270`, :issue:`1416`, :issue:`1474`, :pull:`1311`) were changed:
+    * Optimized and noticeably faster calculation.
+    * Can be computed in two steps: First compute fit parameters with ``xclim.indices.stats.standardized_index_fit_params``, then use the output in the standardized indices functions.
+    * The standardized index values are now clipped to ±8.21. This reflects the ``float64`` precision of the computation when cumulative distributed function values are inverted to a normal distribution and avoids returning infinite values.
+    * An offset parameter is now available to account for negative water balance values``xclim.indices.standardized_precipitation_evapotranspiration_index``.
 
 Bug fixes
 ^^^^^^^^^
@@ -41,6 +46,7 @@ Breaking changes
 * Default threshold in ``xclim.indices.snw_season_{start|length|end}`` changed form `20 kg m-2` to `4 kg m-2`. (:pull:`1505`).
 * `xclim` development dependencies now include `ruff`. `pycodestyle` and `pydocstyle` have been replaced by `ruff` and removed from the `dev` installation recipe. (:pull:`1504`).
 * The `mf_file` call signature found in ``xclim.ensembles.create_ensemble`` (and ``xclim.ensembles._ens_align_dataset``) has been removed (deprecated since `xclim` v0.43.0). (:pull:`1506`).
+* ``xclim.indices.standardized_precipitation_index`` and ``xclim.indices.standardized_precipitation_evapotranspiration_index`` will no longer accept two datasets (data and calibration data). Instead, a single dataset covering both the calibration and evaluation periods is expected. (:issue:`1270`, :pull:`1311`).
 
 Internal changes
 ^^^^^^^^^^^^^^^^

diff --git a/tests/test_indices.py b/tests/test_indices.py
@@ -526,10 +526,10 @@ def test_standardized_precipitation_index(
         ds = open_dataset("sdba/CanESM2_1950-2100.nc").isel(location=1)
         pr = ds.pr.sel(time=slice("1998", "2000"))
         pr_cal = ds.pr.sel(time=slice("1950", "1980"))
-        spi = xci.standardized_precipitation_index(
-            pr, pr_cal, freq, window, dist, method
+        params = xci.stats.standardized_index_fit_params(
+            pr_cal, freq=freq, window=window, dist=dist, method=method
         )
-
+        spi = xci.standardized_precipitation_index(pr, params=params)
         # Only a few moments before year 2000 are tested
         spi = spi.isel(time=slice(-11, -1, 2))
 
@@ -610,11 +610,17 @@ def test_standardized_precipitation_evapotranspiration_index(
             tas = tasmax - 2.5
             tasmin = tasmax - 5
             wb = xci.water_budget(pr, None, tasmin, tasmax, tas)
-            wb_cal = wb.sel(time=slice("1950", "1980"))
-            wb = wb.sel(time=slice("1998", "2000"))
 
+        params = xci.stats.standardized_index_fit_params(
+            wb.sel(time=slice("1950", "1980")),
+            freq=freq,
+            window=window,
+            dist=dist,
+            method=method,
+            offset="1 mm/d",
+        )
         spei = xci.standardized_precipitation_evapotranspiration_index(
-            wb, wb_cal, freq, window, dist, method
+            wb.sel(time=slice("1998", "2000")), params=params
         )
 
         # Only a few moments before year 2000 are tested
@@ -625,6 +631,51 @@ def test_standardized_precipitation_evapotranspiration_index(
 
         np.testing.assert_allclose(spei.values, values, rtol=0, atol=diff_tol)
 
+    def test_standardized_index_modularity(self, open_dataset):
+        freq, window, dist, method = "MS", 6, "gamma", "APP"
+        ds = (
+            open_dataset("sdba/CanESM2_1950-2100.nc")
+            .isel(location=1)
+            .sel(time=slice("1950", "2000"))
+        )
+        pr = ds.pr
+        # generate water budget
+        with xr.set_options(keep_attrs=True):
+            tasmax = ds.tasmax
+            tas = tasmax - 2.5
+            tasmin = tasmax - 5
+            wb = xci.water_budget(pr, None, tasmin, tasmax, tas)
+
+        params = xci.stats.standardized_index_fit_params(
+            wb.sel(time=slice("1950", "1980")),
+            freq=freq,
+            window=window,
+            dist=dist,
+            method=method,
+            offset="1 mm/d",
+        )
+        spei1 = xci.standardized_precipitation_evapotranspiration_index(
+            wb.sel(time=slice("1998", "2000")), params=params
+        )
+
+        spei2 = xci.standardized_precipitation_evapotranspiration_index(
+            wb,
+            freq=freq,
+            window=window,
+            dist=dist,
+            method=method,
+            offset="1 mm/d",
+            cal_start="1950",
+            cal_end="1980",
+        ).sel(time=slice("1998", "2000"))
+
+        # In the previous computation, the first {window-1} values are NaN because the rolling is performed on the period [1998,2000].
+        # Here, the computation is performed on the period [1950,2000], *then* subsetted to [1998,2000], so it doesn't have NaNs
+        # for the first values
+        spei2[{"time": slice(0, window - 1)}] = np.nan
+
+        np.testing.assert_allclose(spei1.values, spei2.values, rtol=0, atol=1e-4)
+
 
 class TestDailyFreezeThawCycles:
     @pytest.mark.parametrize(