Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SeasonGrouper, SeasonResampler #9524

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

dcherian
Copy link
Contributor

@dcherian dcherian commented Sep 20, 2024

These two groupers allow defining custom seasons, and dropping incomplete seasons from the output. Both cases are treated by adjusting the factorization -- conversion from group labels to integer codes -- appropriately.

The last piece from #8509

Example:

import xarray as xr
from xarray.groupers import SeasonGrouper, SeasonResampler

ds = xr.tutorial.open_dataset("air_temperature")

# custom seasons! 
ds.air.groupby(time=SeasonGrouper(["JF", "MAM", "JJAS", "OND"])).mean()

ds.air.resample(time=SeasonResampler(["DJF", "MAM", "JJAS", "ON"])).count()

TODO:

  • Needs a boatload of tests
  • narrative docs
  • support drop_incomplete in SeasonGrouper
  • more cftime calendar support

cc @tomvothecoder do you have time to contribute some tests? I bet we'll simplify a bunch of xcdat this way, and you probably already have tests :)

@oliviermarti
Copy link

First comment, but I have performed only quick test

1 -

In your short example, it's probably :
from xarray.groupers import SeasonGrouper, SeasonResampler
and not :
from xarray.core.groupers import SeasonGrouper, SeasonResampler

Or my Github knowledge is too limited, and I'm not testing the right branch.

2 - Season grouper

Seems OK for all I have tested. In particular I can :

  • Use it for one season only (subsampling):
    ds.air.groupby(time=SeasonGrouper(["MAM"]))['MAM']

  • Use it for 4 months season, as we often use in paleoclimate with calendar shifting (oversampling)
    ds.air.groupby(time=SeasonGrouper(["DJFM", "MAMJ", "JJAS", "SOND"])).mean()

3 - Season resampler

Works as expected from the example.
I would really appreciate to have access to subsampling :
ds.air.resample(time=SeasonResampler(["DJF", "MAM", "JJA", "ON"]))
or :
ds.air.resample(time=SeasonResampler(["JJAS"]))
and to oversampling :
ds.air.resample(time=SeasonResampler(["DJFM", "MAMJ", "JJAS", "SOND"]))

It could be useful to have a NaN value for an incomplete season : the first DJF cannot not be computed, and is not. This mean that the first value is not a DJF one, but a MAM value. Could be a bit misleading.

4 - cftime

I have tested it with cftime calendars instead of datetime. It works with the traditional calendar (gregorian, standard). But not with others like 360_day, 365_day, julian., proleptic_gregorian :
TypeError: cannot compute the time difference between dates with different calendars

5 - Simple data

I've build a dataset with the number ot the month as a variable. So I'm sure that the computation is correct.

Thanks' for these features. They are quit easy and straigthforward to use.

In particular, it allows to work on variables, as xcdat features work on Dataset only, which yields a more complicated syntax.

I'm gonna try to imagine further tests.

Olivier

@dcherian
Copy link
Contributor Author

dcherian commented Sep 20, 2024

Thanks @oliviermarti ! this is incredibly helpful

In your short example, it's probably : from xarray.groupers import SeasonGrouper, SeasonResampler

Yes, my mistake. I fixed the snippet.

ds.air.groupby(time=SeasonGrouper(["DJFM", "MAMJ", "JJAS", "SOND"])).mean()

This should not work, did you really get correct results.

It could be useful to have a NaN value for an incomplete season

The drop_incomplete option should let you control this.

@oliviermarti
Copy link

ds.air.groupby(time=SeasonGrouper(["DJFM", "MAMJ", "JJAS", "SOND"])).mean()

This should not work, did you really get correct results.

In fact not ! Only the first value is correct. A bit dangerous that it returns a result and not an error.

It could be useful to have a NaN value for an incomplete season
The drop_incomplete option should let you control this.

drop_incomplete compute a value, using less month. This is the behaviour of xcdat. It would like to get a nan (I should ask that to xcdat too).

Olivier

@dcherian
Copy link
Contributor Author

dcherian commented Sep 20, 2024

ds.air.groupby(time=SeasonGrouper(["DJFM", "MAMJ", "JJAS", "SOND"])).mean()

I went back to my dev notebook, turns out I figured this out already!

image

@tomvothecoder
Copy link
Contributor

cc @tomvothecoder do you have time to contribute some tests? I bet we'll simplify a bunch of xcdat this way, and you probably already have tests :)

Hi @dcherian, thank you for this PR! I've been looking forward to having this feature in Xarray. No guarantees on a timeline, but I plan to start looking at this PR this week. I'll experiment with this feature and see how I can leverage it to simplify xCDAT PR #423 for custom seasons. I'll also try to contribute any useful tests.

These two groupers allow defining custom seasons, and dropping
incomplete seasons from the output. Both cases are treated by adjusting
the factorization -- conversion from group labels to integer codes -- appropriately.
@tomvothecoder
Copy link
Contributor

tomvothecoder commented Nov 13, 2024

Hey @dcherian, quick question. Will this PR add support for using SeasonGrouper along with Datetime components (e.g., ds.time.dt.year)?

For example, if I wanted to perform grouped averaging on year and custom seasons it might look like:

ds.air.groupby(time=[ds.time.dt.year, SeasonGrouper(["JF", "MAM", "JJAS", "OND"])]).mean()

@dcherian
Copy link
Contributor Author

dcherian commented Nov 13, 2024

Yes, I think so this is conceptually similar to groupby(["time.month", "time.year"]) but we require dict inputs to be explicit

ds.air.groupby(
    {"time.year": UniqueGrouper(), "time": SeasonGrouper(["JF", "MAM", "JJAS", "OND"])}
).mean()

Here's an example, that is hopefully easy to interpret
image

Is this right?

One trouble I'm having here is defining good tests for arbitrary seasons. Do you have suggestions or existing ones in xcdat that i can generalize?

@tomvothecoder
Copy link
Contributor

Yes, I think so this is conceptually similar to groupby(["time.month", "time.year"]) but we require dict inputs to be explicit

ds.coords["year"] = ds.time.dt.year
ds.air.groupby(year=UniqueGrouper, time=SeasonGrouper(["JF", "MAM", "JJAS", "OND"])).mean()

Here's an example, that is hopefully easy to interpret image

Is this right?

Yeah that's exactly what I was looking for. Thanks!

One trouble I'm having here is defining good tests for arbitrary seasons. Do you have suggestions or existing ones in xcdat that i can generalize?

Sure I have some in xcdat that you can adapt for this PR. I will share them with you shortly.

@tomvothecoder
Copy link
Contributor

Another question: If we're defining custom seasons with months that span the calendar year, those months are from the previous year correct?

For example for "NDJFM", "ND" should be from the previous year.

air.groupby(year=UniqueGrouper(), time=SeasonGrouper(["NDJFM"])) 

@dcherian
Copy link
Contributor Author

Yes it tried to be that smart

@dcherian
Copy link
Contributor Author

@tomvothecoder @oliviermarti i fixed the existing tests now, please try it out!

FWIW the need to support seasons=["JJAS"] is adding quite some complexity. We should consider not supporting it.

@tomvothecoder
Copy link
Contributor

tomvothecoder commented Nov 15, 2024

I'm writing a few tests right now. How do you want me to add them to your fork branch?

Another question: If we're defining custom seasons with months that span the calendar year, those months are from the previous year correct?

For example for "NDJFM", "ND" should be from the previous year.

air.groupby(year=UniqueGrouper(), time=SeasonGrouper(["NDJFM"])) 

I noticed in a test I'm writing for the above code that "ND" is being taken from the same year, not the previous year. I think we expect the previous year "ND" to be used instead. I will show a clear example once I add the test.

@dcherian
Copy link
Contributor Author

Ah nice find. A PR to this branch should be the easiest

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Ordered Groupby Keys
4 participants