Skip to content

Commit

Permalink
Proper use of loc parameter in gamma,fisk dists (SPI/SPEI) (#1720)
Browse files Browse the repository at this point in the history
<!--Please ensure the PR fulfills the following requirements! -->
<!-- If this is your first PR, make sure to add your details to the
AUTHORS.rst! -->
### Pull Request Checklist:
- [x] This PR addresses an already opened issue (for bug fixes /
features)
    - This PR fixes #xyz
- [x] Tests for the changes have been added (for bug fixes / features)
- [ ] (If applicable) Documentation has been added / updated (for bug
fixes / features)
- [x] CHANGES.rst has been updated (with summary of main changes)
- [x] Link to issue (:issue:`number`) and pull request (:pull:`number`)
has been added

### What kind of change does this PR introduce?


* Spei used to rely on an offset to ensure that distribution of values
are positive when using distributions such as `gamma, fisk`. Now,
instead, we properly use the `loc` parameter in those distributions that
generalize the distributions and allow negative values. For instance,
the `gamma` disitribution is a 2-parameter function, but in ``scipy``
it's generalized to 3 parameters, with a `loc` parameter. This extends
the support of `gamma` from $x \in [0,\infty]$ to $x \in [\text{loc},
\infty]$.

The code was already compatible with this idea, just had to change
``_fit_start`` so it can compute estimates of the parameters in the
cases with negatives values.


### Does this PR introduce a breaking change?

* Removing the offset from SPEI. Also, the `_fit_start` can now compute
an estimate of `loc` which can affect what value we obtain when
optimizing 3-parameters distributions. Even when the `loc` is fixed to 0
with `floc=0` in `fitkwargs`, the estimate of `fisk` was modified. The
previous formulation was obtained by applying more or less the idea in
the method of moments, but now I obtained an approximation that's more
rigorous.

* The case of zero inflated distribution has been changed: Now, it is
the calibration data that determines the probability of zero, as it
should.

* Method `APP` can now only be used if `fitkwargs['floc']` is given as
input by the user.

### Other information:
  • Loading branch information
coxipi authored May 2, 2024
2 parents 4fb1ddd + 1cc8372 commit 4f2e633
Show file tree
Hide file tree
Showing 9 changed files with 427 additions and 201 deletions.
9 changes: 7 additions & 2 deletions CHANGES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Changelog

v0.49.0 (unreleased)
--------------------
Contributors to this version: Trevor James Smith (:user:`Zeitsperre`), Pascal Bourgault (:user:`aulemahal`), Juliette Lavoie (:user:`juliettelavoie`), David Huard (:user:`huard`), Gabriel Rondeau-Genesse (:user:`RondeauG`), Javier Diez-Sierra (:user:`JavierDiezSierra`).
Contributors to this version: Trevor James Smith (:user:`Zeitsperre`), Pascal Bourgault (:user:`aulemahal`), Juliette Lavoie (:user:`juliettelavoie`), David Huard (:user:`huard`), Gabriel Rondeau-Genesse (:user:`RondeauG`), Javier Diez-Sierra (:user:`JavierDiezSierra`), Sarah Gammon (:user:`SarahG-579462`), Éric Dupuis (:user:`coxipi`).

Announcements
^^^^^^^^^^^^^
Expand All @@ -13,8 +13,10 @@ Announcements
New features and enhancements
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
* Indicator ``xclim.atmos.potential_evapotranspiration`` and indice ``xclim.indices.potential_evapotranspiration`` now accept a new value (`DA02`) for argument `method` implementing potential evapotranspiration based on Droogers and Allen (2002). (:issue:`1710`, :pull:`1723`).
* ``xclim.ensembles.ensemble_percentiles`` now takes a `method` argument, accepting one of: `'interpolated_inverted_cdf'`, `'hazen'`, `'weibull'`, `'linear'` (default), `'median_unbiased'`, or `'normal_unbiased'`. (:issue:`1694`, :pull:`1732`).
* The documentation now uses the `furo <https://github.com/pradyunsg/furo>`_ theme for Sphinx. This theme supports native "light" and "dark" modes, adaptive screen resolution, as well as provides a better navigation layout for pages housing long lists of entries (e.g. `indices`). (:issue:`1693`, :pull:`1731`).
* ``xclim.ensembles.ensemble_percentiles`` now takes a `method` argument, accepting one of: `'interpolated_inverted_cdf'`, `'hazen'`, `'weibull'`, `'linear'` (default), `'median_unbiased'`, or `'normal_unbiased'`. (:issue:`1694`, :pull:`1732`).
* Distributions with negative values are directly fitted without need for an offset for distributions such as `'gamma'` and `'fisk'` in ``xclim.indices.standardized_precipitation_evapotranspiration_index``. (:issue:`1477` :pull:`1720`).
* ``xclim.indices.stats_fit_start`` gives an estimate of the `loc` parameter for `'gamma'` and `'fisk'` distributions. (:issue:`1477` :pull:`1720`).

New indicators
^^^^^^^^^^^^^^
Expand All @@ -30,6 +32,9 @@ Breaking changes
* The `__getitem__` method of ``xclim.core.indicator.Parameter`` instances has been removed. Accessing members of ``Parameters`` now uniquely uses dot notation. (:pull:`1721`).
* The obsolete function wrapper for generating Indicators ``xclim.core.utils.wrapped_partial`` has been removed. (:pull:`1721`).
* The default documentation theme has changed from `sphinx-rtd-theme` to `furo`; Several modifications to the documentation configuration and CSS overrides have been made to accommodate the changes. `furo` is now a `docs` dependency. (:issue:`1693`, :pull:`1731`).
* Estimation of parameters using `_fit_start` for `gamma` and `fisk` has been changed and can affect the results obtained with full-fledged (e.g. "ML") methods. (:issue:`1477` :pull:`1720`).
* Method `APP` in ``xclim.indices.standardized_precipitation_index`` and ``xclim.indices.standardized_precipitation_evapotranspiration_index`` now requires the user to impose a `loc` parameter through `fitkwargs['floc']` (:issue:`1477` :pull:`1720`).
* Zero inflated distributions used in ``xclim.stats.standardized_index`` now appropriately use the probability of zeroes in the calibration data and not the entire dataset. (:issue:`1477` :pull:`1720`).

Bug fixes
^^^^^^^^^
Expand Down
25 changes: 25 additions & 0 deletions docs/references.bib
Original file line number Diff line number Diff line change
Expand Up @@ -2115,6 +2115,31 @@ @article{brode_utci_2012
url = {https://doi.org/10.1007/s00484-011-0454-1}
}

@article{muralidhar_1992,
title = {A {{Simple Minimum}}-{{Bias Percentile Estimator}} of the {{Location Parameter}} for the {{Gamma}}, {{Weibull}}, and {{Log}}-{{Normal Distributions}}},
author = {Muralidhar, Krishnamurty and Zanakis, Stelios H.},
year = {1992},
month = jul,
journal = {Decision Sciences},
volume = {23},
number = {4},
pages = {862--879},
issn = {0011-7315, 1540-5915},
doi = {10.1111/j.1540-5915.1992.tb00423.x}
}

@article{cooke_1979,
title = {Statistical Inference for Bounds of Random Variables},
author = {Cooke, Peter},
year = {1979},
journal = {Biometrika},
volume = {66},
number = {2},
pages = {367--374},
issn = {0006-3444, 1464-3510},
doi = {10.1093/biomet/66.2.367}
}

@article{droogers2002,
author = {Droogers, Peter and Allen, Richard G.},
title = {Estimating reference evapotranspiration under inaccurate data conditions},
Expand Down
3 changes: 3 additions & 0 deletions tests/test_indicators.py
Original file line number Diff line number Diff line change
Expand Up @@ -599,6 +599,9 @@ def test_parsed_doc():
assert params["season_method"].kind is InputKind.STRING
assert params["season_method"].choices == {"GFWED", None, "WF93", "LA08"}

params = xclim.atmos.standardized_precipitation_evapotranspiration_index.parameters
assert params["fitkwargs"].kind is InputKind.DICT


def test_default_formatter():
assert default_formatter.format("{freq}", freq="YS") == "annual"
Expand Down
117 changes: 84 additions & 33 deletions tests/test_indices.py
Original file line number Diff line number Diff line change
Expand Up @@ -456,6 +456,8 @@ def test_effective_growing_degree_days(

np.testing.assert_array_equal(out, np.array([np.NaN, expected]))


class TestStandardizedIndices:
# gamma/APP reference results: Obtained with `monocongo/climate_indices` library
# MS/fisk/ML reference results: Obtained with R package `SPEI`
# Using the method `APP` in XClim matches the method from monocongo, hence the very low
Expand All @@ -482,18 +484,15 @@ def test_effective_growing_degree_days(
[0.598209, 1.55976, 1.69309, 0.9964, 0.7028],
2e-2,
),
("MS", 1, "gamma", "ML", [1.3166, 1.4507, 1.9461, -3.09, 0.85068], 0.2),
("MS", 1, "gamma", "ML", [1.4586, 1.6028, 2.0668, -3.09, 0.84841], 2e-2),
("MS", 12, "gamma", "ML", [0.59821, 1.5598, 1.6931, 0.9964, 0.7028], 0.04),
(
"MS",
12,
1,
"gamma",
"ML",
[0.62578, 1.5417, 1.7164, 0.99776, 0.69889],
2e-2,
[1.460105, 1.602951, 2.072521, -3.09, 0.891468],
0.04,
),
("MS", 1, "fisk", "ML", [1.7296, 1.5135, 2.0072, -2.7331, 0.89206], 0.35),
("MS", 12, "gamma", "ML", [0.59821, 1.5598, 1.6931, 0.9964, 0.7028], 0.04),
(
"MS",
1,
Expand All @@ -502,14 +501,6 @@ def test_effective_growing_degree_days(
[1.41236, 1.51192, 1.93324, -2.74089, 0.932674],
2e-2,
),
(
"MS",
12,
"fisk",
"ML",
[0.683273, 1.51189, 1.61597, 1.03875, 0.72531],
0.035,
),
(
"MS",
12,
Expand Down Expand Up @@ -539,47 +530,47 @@ def test_effective_growing_degree_days(
1,
"gamma",
"ML",
[-0.03577971, 1.30589409, 0.8863447, 0.23906544, -0.05185997],
[-0.011698, 1.597031, 0.969714, 0.265561, -0.132654],
2e-2,
),
(
"D",
12,
"gamma",
"ML",
[-0.15846245, -0.04924534, 0.66299367, 1.09938471, 0.66095752],
[-0.158116, -0.049222, 0.672544, 1.08332, 0.660903],
2e-2,
),
(
"D",
1,
"fisk",
"APP",
[-1.26216389, 1.03096183, 0.62985354, -0.50335153, -1.32788296],
"ML",
[-0.158949, 1.308225, 0.449846, 0.146699, -0.502737],
2e-2,
),
(
"D",
12,
"fisk",
"APP",
[-0.57109258, -0.40657737, 0.55163493, 0.97381067, 0.55580649],
"ML",
[-0.14151269, -0.01914608, 0.7080277, 1.01510279, 0.6954002],
2e-2,
),
(
"D",
1,
"fisk",
"ML",
[-0.05562691, 1.30809152, 0.6954986, 0.33018744, -0.50258979],
"APP",
[-0.417288, 1.275686, 1.002566, 0.206854, -0.672117],
2e-2,
),
(
"D",
12,
"fisk",
"ML",
[-0.14151269, -0.01914608, 0.7080277, 1.01510279, 0.6954002],
"APP",
[-0.220136, -0.065782, 0.783319, 1.134428, 0.782857],
2e-2,
),
(
Expand Down Expand Up @@ -610,8 +601,17 @@ def test_standardized_precipitation_index(
) # to compare with ``climate_indices``
pr = ds.pr.sel(time=slice("1998", "2000"))
pr_cal = ds.pr.sel(time=slice("1950", "1980"))
fitkwargs = {}
if method == "APP":
fitkwargs["floc"] = 0
params = xci.stats.standardized_index_fit_params(
pr_cal, freq=freq, window=window, dist=dist, method=method
pr_cal,
freq=freq,
window=window,
dist=dist,
method=method,
fitkwargs=fitkwargs,
zero_inflated=True,
)
spi = xci.standardized_precipitation_index(pr, params=params)
# Only a few moments before year 2000 are tested
Expand Down Expand Up @@ -650,7 +650,14 @@ def test_standardized_precipitation_index(
2e-2,
),
("MS", 1, "gamma", "ML", [1.38, 1.58, 2.1, -3.09, 0.868], 1e-1),
("MS", 1, "gamma", "ML", [1.4631, 1.6053, 2.1625, -3.09, 0.87996], 2e-2),
(
"MS",
1,
"gamma",
"ML",
[1.467832, 1.605313, 2.137688, -3.09, 0.878549],
5e-2,
),
(
"MS",
12,
Expand All @@ -664,8 +671,8 @@ def test_standardized_precipitation_index(
12,
"gamma",
"ML",
[0.6511, 1.6146, 1.8580, 1.0140, 0.6878],
2e-2,
[0.651093, 1.614638, 1.83526, 1.014005, 0.69868],
5e-2,
),
("MS", 1, "fisk", "ML", [1.73, 1.51, 2.05, -3.09, 0.892], 3.5e-1),
("MS", 1, "fisk", "ML", [1.4167, 1.5117, 2.0562, -3.09, 0.9422], 2e-2),
Expand All @@ -680,6 +687,10 @@ def test_standardized_precipitation_index(
),
],
)
# Eventually, this should also be compared to monocongo
# there are some issues where the data below can still have negative values
# after the ``climate_indices`` 1000.0 offset, so it's a problem with ``climate_indices``
# library. Address this in the future.
def test_standardized_precipitation_evapotranspiration_index(
self, open_dataset, freq, window, dist, method, values, diff_tol
):
Expand All @@ -695,14 +706,19 @@ def test_standardized_precipitation_evapotranspiration_index(
tas = tasmax - 2.5
tasmin = tasmax - 5
wb = xci.water_budget(pr, None, tasmin, tasmax, tas)

if method == "APP":
offset = convert_units_to("1 mm/d", wb, context="hydro")
fitkwargs = {"floc": -offset}
else:
fitkwargs = {}
params = xci.stats.standardized_index_fit_params(
wb.sel(time=slice("1950", "1980")),
freq=freq,
window=window,
dist=dist,
method=method,
offset="1 mm/d",
fitkwargs=fitkwargs,
zero_inflated=False,
)
spei = xci.standardized_precipitation_evapotranspiration_index(
wb.sel(time=slice("1998", "2000")), params=params
Expand Down Expand Up @@ -731,13 +747,16 @@ def test_standardized_index_modularity(self, open_dataset):
tasmin = tasmax - 5
wb = xci.water_budget(pr, None, tasmin, tasmax, tas)

offset = convert_units_to("1 mm/d", wb, context="hydro")
fitkwargs = {"floc": -offset}

params = xci.stats.standardized_index_fit_params(
wb.sel(time=slice("1950", "1980")),
freq=freq,
window=window,
dist=dist,
method=method,
offset="1 mm/d",
fitkwargs=fitkwargs,
month=[2, 3],
)
spei1 = xci.standardized_precipitation_evapotranspiration_index(
Expand All @@ -750,7 +769,7 @@ def test_standardized_index_modularity(self, open_dataset):
window=window,
dist=dist,
method=method,
offset="1 mm/d",
fitkwargs=fitkwargs,
cal_start="1950",
cal_end="1980",
month=[2, 3],
Expand All @@ -763,6 +782,38 @@ def test_standardized_index_modularity(self, open_dataset):

np.testing.assert_allclose(spei1.values, spei2.values, rtol=0, atol=1e-4)

def test_zero_inflated(self, open_dataset):
# This tests that the zero_inflated option makes a difference with zero inflated data
pr = (
open_dataset("sdba/CanESM2_1950-2100.nc")
.isel(location=1)
.sel(time=slice("1950", "1980"))
).pr
# july 1st (doy=180) with 10 years with zero precipitation
pr[{"time": slice(179, 365 * 11, 365)}] = 0
spid = {}
input_params = dict(
freq=None,
window=1,
dist="gamma",
method="ML",
fitkwargs={},
doy_bounds=(180, 180),
)
for zero_inflated in [False, True]:
input_params["zero_inflated"] = zero_inflated
params = xci.stats.standardized_index_fit_params(pr, **input_params)
spid[zero_inflated] = xci.stats.standardized_index(
pr, params=params, cal_start=None, cal_end=None, **input_params
)
# drop doys other than 180 that will be NaNs
spid[zero_inflated] = spid[zero_inflated].where(
spid[zero_inflated].notnull(), drop=True
)
np.testing.assert_equal(
np.all(np.not_equal(spid[False].values, spid[True].values)), True
)


class TestDailyFreezeThawCycles:
@pytest.mark.parametrize(
Expand Down
20 changes: 4 additions & 16 deletions tests/test_precip.py
Original file line number Diff line number Diff line change
Expand Up @@ -191,20 +191,10 @@ def test_3d_data_with_nans(self, open_dataset):
pr.values[10] = np.nan

out1 = atmos.standardized_precipitation_index(
pr,
pr_cal=pr,
freq="MS",
window=1,
dist="gamma",
method="APP",
pr, freq="MS", window=1, dist="gamma", method="APP", fitkwargs={"floc": 0}
)
out2 = atmos.standardized_precipitation_index(
prMM,
pr_cal=prMM,
freq="MS",
window=1,
dist="gamma",
method="APP",
prMM, freq="MS", window=1, dist="gamma", method="APP", fitkwargs={"floc": 0}
)
np.testing.assert_array_almost_equal(out1, out2, 3)

Expand All @@ -218,21 +208,19 @@ def test_3d_data_with_nans(self, open_dataset):

out3 = atmos.standardized_precipitation_evapotranspiration_index(
wb,
wb_cal=wb,
freq="MS",
window=1,
dist="gamma",
method="APP",
# method="ML",
fitkwargs={"floc": 0},
)
out4 = atmos.standardized_precipitation_evapotranspiration_index(
wbMM,
wb_cal=wbMM,
freq="MS",
window=1,
dist="gamma",
method="APP",
# method="ML",
fitkwargs={"floc": 0},
)

np.testing.assert_array_almost_equal(out3, out4, 3)
Expand Down
1 change: 1 addition & 0 deletions xclim/core/formatting.py
Original file line number Diff line number Diff line change
Expand Up @@ -556,6 +556,7 @@ def unprefix_attrs(source: dict, keys: Sequence, prefix: str) -> dict:
InputKind.DAY_OF_YEAR: "date (string, MM-DD)",
InputKind.DATE: "date (string, YYYY-MM-DD)",
InputKind.BOOL: "boolean",
InputKind.DICT: "dict",
InputKind.DATASET: "Dataset, optional",
InputKind.KWARGS: "",
InputKind.OTHER_PARAMETER: "Any",
Expand Down
8 changes: 8 additions & 0 deletions xclim/core/utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -590,6 +590,11 @@ class InputKind(IntEnum):
Annotation : ``bool``, may be optional.
"""
DICT = 10
"""A dictionary.
Annotation : ``dict`` or ``dict | None``, may be optional.
"""
KWARGS = 50
"""A mapping from argument name to value.
Expand Down Expand Up @@ -660,6 +665,9 @@ def infer_kind_from_parameter(param) -> InputKind:
if annot == {"bool"}:
return InputKind.BOOL

if annot == {"dict"}:
return InputKind.DICT

if annot == {"Dataset"}:
return InputKind.DATASET

Expand Down
Loading

0 comments on commit 4f2e633

Please sign in to comment.