Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Error handling timezones #305

Open
Jmbols opened this issue May 3, 2024 · 7 comments
Open

[BUG] Error handling timezones #305

Jmbols opened this issue May 3, 2024 · 7 comments
Assignees
Labels
bug Something isn't working

Comments

@Jmbols
Copy link

Jmbols commented May 3, 2024

Description
There is an error trying to construct an update patch when the x-axis are dates with a specified timezone.
The error is when trying to compare timezones. Pandas pd.to_datetime() by default will convert a timezone to a fixed off-set, whereas the timezone in the x-axis has a different format. The off-set is the same because the data is created based on the same timezone.

/.pyenv/versions/3.11.1/envs/clearview-dash-311/lib/python3.11/site-packages/plotly_resampler/aggregation/plotly_aggregator_parser.py", line 41, in to_same_tz
assert ts.tz.str() == reference_tz.str()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Reproducing the bug 🔍
This code snippet reproduces the bug

import pandas as pd
import numpy as np
import plotly.graph_objects as go

from plotly_resampler import FigureResampler


fig = FigureResampler()

x = pd.date_range("2024-04-01T00:00:00", "2025-01-01T00:00:00", freq="H")
x = x.tz_localize("Asia/Taipei")
y = np.random.randn(len(x))

fig.add_trace(
    go.Scattergl(x=x, y=y, name="demo", mode="lines+markers"),
    max_n_samples=int(len(x) * 0.2),
)

relayout_data = {
    "xaxis.range[0]": "2024-04-27T08:00:00+08:00",
    "xaxis.range[1]": "2024-05-04T17:15:39.491031+08:00",
}

fig.construct_update_data_patch(relayout_data)

Environment information

  • OS: Ubuntu 22.04
  • Python version: 3.11
  • plotly-resampler environment: python and dash
  • plotly-resampler version: 0.9.2
@Jmbols Jmbols added the bug Something isn't working label May 3, 2024
@Jmbols
Copy link
Author

Jmbols commented May 3, 2024

Can be fixed by tz_convert before passing relayout_data to fig.construct_update_data_patch(relayout_data), but the default behaviour interacting with dash is this error.

@Jmbols
Copy link
Author

Jmbols commented May 7, 2024

But fix only works when there is no switch to DST. Timezone Canada/Pacific, for example, changes timezone upon switch to and from DST, so if the above code is run like

import pandas as pd
import numpy as np
import plotly.graph_objects as go

from plotly_resampler import FigureResampler


fig = FigureResampler()

x = pd.date_range("2024-04-01T00:00:00", "2025-01-01T00:00:00", freq="H")
x = x.tz_localize("UTC")
x = x.tz_convert("Canada/Pacific")
y = np.random.randn(len(x))

fig.add_trace(
    go.Scattergl(x=x, y=y, name="demo", mode="lines+markers"),
    max_n_samples=int(len(x) * 0.2),
)

relayout_data = {
    "xaxis.range[0]": pd.Timestamp("2024-03-01T00:00:00").tz_localize("Canada/Pacific"),
    "xaxis.range[1]": pd.Timestamp("2024-03-31T00:00:00").tz_localize("Canada/Pacific"),
}

fig.construct_update_data_patch(relayout_data)

you get the error:
site-packages/plotly_resampler/aggregation/plotly_aggregator_parser.py", line 81, in get_start_end_indices
assert start.tz == end.tz
^^^^^^^^^^^^^^^^^^

Is there a reason not to use assert start.tz.__str__() == end.tz.__str__()? That would solve the assertion error at least with DST if the name of the timezone is the same.

@DHRUVCHARNE
Copy link

You can also try using the pytz library to handle timezone conversions and DST transitions. Here's an example:

import pytz

...

x = pd.date_range("2024-04-01T00:00:00", "2025-01-01T00:00:00", freq="H")
x = x.tz_localize("UTC")
x = x.tz_convert(pytz.timezone("Canada/Pacific"))

...

relayout_data = {
"xaxis.range[0]": pd.Timestamp("2024-03-01T00:00:00").tz_localize(pytz.timezone("Canada/Pacific")),
"xaxis.range[1]": pd.Timestamp("2024-03-31T00:00:00").tz_localize(pytz.timezone("Canada/Pacific")),
}

@jonasvdd jonasvdd self-assigned this Sep 5, 2024
@jonasvdd
Copy link
Member

jonasvdd commented Sep 9, 2024

@Jmbols, @DHRUVCHARNE,

I tried to fix this behavior in #318 by catching the legacy tz-string assert (see ⬇️), and then compare for offsets (see ⬇️ ⬇️ )

However, this introduces the possibly unwanted behavior, that different timezones with the same offset, are considered valid. (e.g. "Europe/Brussels" and "Europe/Amsterdam" are two different timezone objects / strings, but with the same offset -> so they are considered as equal.)

This is also expressed in the following tests:

def test_time_tz_slicing_different_timestamp():
# construct a time indexed series with UTC timezone
n = 60 * 60 * 24 * 3
dr = pd.Series(
index=pd.date_range("2022-02-14", freq="s", periods=n, tz="UTC"),
data=np.random.randn(n),
)
# create multiple other time zones
cs = [
dr,
dr.tz_localize(None).tz_localize("Europe/Amsterdam"),
dr.tz_convert("Europe/Lisbon"),
dr.tz_convert("Australia/Perth"),
dr.tz_convert("Australia/Canberra"),
]
for i, s in enumerate(cs):
t_start, t_stop = sorted(s.iloc[np.random.randint(0, n, 2)].index)
t_start = t_start.tz_convert(cs[(i + 1) % len(cs)].index.tz)
t_stop = t_stop.tz_convert(cs[(i + 1) % len(cs)].index.tz)
# As each timezone in CS tz aware, using other timezones in `t_start` & `t_stop`
# will raise an AssertionError
with pytest.raises(AssertionError):
hf_data_dict = construct_hf_data_dict(s.index, s.values)
start_idx, end_idx = PlotlyAggregatorParser.get_start_end_indices(
hf_data_dict, hf_data_dict["axis_type"], t_start, t_stop
)
# THESE have the same timezone offset -> no AssertionError should be raised
cs = [
dr.tz_localize(None).tz_localize("Europe/Amsterdam"),
dr.tz_convert("Europe/Brussels"),
dr.tz_convert("Europe/Oslo"),
dr.tz_convert("Europe/Paris"),
dr.tz_convert("Europe/Rome"),
]
for i, s in enumerate(cs):
t_start, t_stop = sorted(s.iloc[np.random.randint(0, n, 2)].index)
t_start = t_start.tz_convert(cs[(i + 1) % len(cs)].index.tz)
t_stop = t_stop.tz_convert(cs[(i + 1) % len(cs)].index.tz)
hf_data_dict = construct_hf_data_dict(s.index, s.values)
start_idx, end_idx = PlotlyAggregatorParser.get_start_end_indices(
hf_data_dict, hf_data_dict["axis_type"], t_start, t_stop
)

I would like to hear your opinion on this matter before continuing on this PR.

@jonasvdd
Copy link
Member

@Jmbols @DHRUVCHARNE, any thoughts/remarks on my above comment?

@Jmbols
Copy link
Author

Jmbols commented Oct 24, 2024

This is the behaviour I would expect. "Europe/Brussels" and "Europe/Amsterdam" are equivalent for all intents and purposes. The main issue I can see with that is if they have different dates when switching to and from DST, but presumably this would be caught by the offset check?

@mhangaard
Copy link

Would it be easier if everything was converted to UTC first, then calculate, then at the last moment before returning a Patch, convert back to the original timezone?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants