Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

View in reticulate shows incorrect time in Data Viewer #1269

Closed
ddelzell opened this issue Aug 30, 2022 · 3 comments
Closed

View in reticulate shows incorrect time in Data Viewer #1269

ddelzell opened this issue Aug 30, 2022 · 3 comments

Comments

@ddelzell
Copy link

ddelzell commented Aug 30, 2022

if you create a pandas dataframe in reticulate with NO zone information, the Data Viewer changes the time to the local timezone on the machine.

s = pd.to_datetime(pd.Series(['2022-04-26 12:00:00', '2022-06-14 08:30'])).to_frame()
View(s) # on my machine the data viewer shows 07:00:00, and 03:30:00, 5 hours behind.

Why is the data viewer changing the time? Also, I did install the latest version of reticulate via Github before running this code.

MOST DISTURBINGLY. When calling the python dataframe in R and saving it as an R object, the incorrect times populate it.

py$s shows the incorrect times.

Screen Shot 2022-08-30 at 1 44 27 PM

@t-kalinowski
Copy link
Member

t-kalinowski commented Aug 30, 2022

This is quite interesting. Unfortunately, I'm not sure what we could do in reticulate.

It looks like the pd.to_datetime returns a tz unaware object in this example, but parses the timestamp as UTC, not local time. Consider this sequence:

>>> pd.to_datetime('2022-04-26 12:00:00').timestamp()
1650974400.0
>>> pd.to_datetime('2022-04-26 12:00:00', utc = False).timestamp()
1650974400.0
>>> pd.to_datetime('2022-04-26 12:00:00', utc = True).timestamp()
1650974400.0
> as.numeric(as.POSIXct('2022-04-26 12:00:00'))
[1] 1650988800      # different from what python returns 

# taking the number from pd.to_datetime().timestamp()
> .POSIXct(1650974400.0)
[1] "2022-04-26 08:00:00 EDT"

> .POSIXct(1650974400.0, "UTC")
[1] "2022-04-26 12:00:00 UTC"

This is tricky because the oddity here is in the to_datetime() parsing convention (bug?). In most other cases throughout Python, a tz naive object is assumed to be in local time, and that's the convention that reticulate uses. Injecting a UTC tz attribute when converting here to paper over pandas idiosyncrasies might be possible, but would probably break other code where people are depending on the common convention of no-tz==local-tz.

(e.g., just yesterday: #1265).

My advice is to use a more robust parsing approach when creating the datetime objects via pandas methods, e.g, by inject the tz offset directly to the strings.

x = ['2022-04-26 12:00:00', '2022-06-14 08:30']
pd.to_datetime([t + "-0400" for t in x])

@ddelzell
Copy link
Author

This is very helpful, thank you.

@ddelzell
Copy link
Author

By the way, changing my timezone environment variable in RStudio (which defaults to my local time zone) fixed the issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants