Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: Support Polars DataFrames #1319

Open
vnijs opened this issue Jan 12, 2023 · 11 comments
Open

Feature request: Support Polars DataFrames #1319

vnijs opened this issue Jan 12, 2023 · 11 comments

Comments

@vnijs
Copy link

vnijs commented Jan 12, 2023

I have been using Polars in Python and it is a wonderful, fast, DataFrame library for Python and Rust. There even seems to be work on creating R-bindings for polars as well (https://github.com/pola-rs/r-polars).

I use reticulate a lot in shiny apps and it would be great if reticulate could also support the Polars DataFrame format, at least in terms being able to convert a Polars DataFrame to an R data.frame. Since polars is based on Arrow, I hope this may be possible.

Below an example of what happens currently when using reticulate with a polars data.frame.

library(reticulate)

test_str <- '
import polars as pl

df = pl.DataFrame({
  "a": [1, 2, 3],
  "b": ["x", "y", "z"]
})
'

answer <- py_run_string(test_str)

# works
py$df

# reports false
is.data.frame(py$df)

# DataFrame looks good
py$df

# row indexing works
py$df[0]
py$df[2]
py$df[3]

# column indexing works
py$df["b"]
py$df["a"]

# not all indexing works
py$df[2, :]

# reports an error
as.data.frame(py$df)

# Error in as.data.frame.default(py$df) :
#  cannot coerce class 'c("polars.internals.dataframe.frame.DataFrame", "python.builtin.object"' to a data.frame
@OmarAshkar
Copy link

+1 for this one. Maybe adding an option() argument to choose between panadas and polars.

@dfalbel
Copy link
Member

dfalbel commented Jul 12, 2023

Just to make sure I understand the request correctly.

We could implement the py_to_r method for polars data frames. This means that whenever a python function called by reticulate returned a polars data frame, it would be converted into an R data frame. This is the same behavior as we have for pandas. Users can opt out by passing convert = FALSE when importing the module.

For an example, if we implemented py_to_r for polars data frames, calling something like the below would return an R data frame, while it currently returns a polars pointer to a Polars data frame.

polars <- reticulate::import("polars")
df <- polars$dataframe$DataFrame(data = list(
  hello = 1:5
))
df

To be fair, you can get an R data.frame pretty easily by doing:

df$to_pandas()

which will trigger py_to_r method for pandas data frames.

We could also add an option to the r_to_py dataframe method, so R dataframes get converted into polars data frames when cast to Python objects.

Is that what you are suggesting? I don't have strong feelings about either option. However if we add py_to_r for polars data frames it will be a potential breaking change as users might already be relying on the fact that polars data frames aren't automatically cast into R objects.

@OmarAshkar
Copy link

Yes I am for an automatic py_to_r(). And definitely a parameter should be available for users.

@dfalbel
Copy link
Member

dfalbel commented Jul 14, 2023

@OmarAshkar, do you have an example of some usage that automatic convertion is much nicer than calling .to_pandas().
I'm leaning towards not implementing this in reticulate as casting is simple one-liner and it's probably going to be a breaking change for some users.

@vnijs
Copy link
Author

vnijs commented Jul 15, 2023

@dfalbel Thanks for taking a look at this. What exactly would break? The fact that folks focusing on polars could remove steps in their work? If there are any breaks, I assume they would be quite happy about things being made simpler. It would definitely make writing tests for python/polars to be executed through reticulate much easier.

@t-kalinowski
Copy link
Member

I think what @dfalbel is suggesting is that users likely have existing workflows where they are expecting polars dataframes to not eagerly convert to R dataframes, (similar to how TensorFlow tensors don't automatically converting to R arrays, even when convert = TRUE).

The most minimal changes I can think of, that won't break existing workflows, would be to add an as.data.frame.<polars-df> method, which could simply be as_r_value(x$to_pandas()). This would make as_tibble() work as well.

@t-kalinowski
Copy link
Member

We can also add a [.<polars-df> method, to make missing axes more ergonomic.
E.g., make py$df[2, ] equivalent to df[2, :] in python.

Today, if you want to pass a python : to [, that can be done (admittedly, not very ergonomically) like this:

bt <- import_builtins()
bt$slice(NULL)

for example

py$df[2, bt$slice(NULL)]

@t-kalinowski
Copy link
Member

The current version of reticulate brings slice support to [ and [<-. (Added in #1432).

This now works:

## slice a NumPy array
x <- np_array(array(1:64, c(4, 4, 4)))

# R expression | Python expression
# ------------ | -----------------
  x[0]         # x[0]
  x[, 0]       # x[:, 0]
  x[, , 0]     # x[:, :, 0]

  x[NA:2]      # x[:2]
  x[`:2`]      # x[:2]

  x[2:NA]      # x[2:]
  x[`2:`]      # x[2:]

  x[NA:NA:2]   # x[::2]
  x[`::2`]     # x[::2]

  x[1:3:2]     # x[1:3:2]
  x[`1:3:2`]   # x[1:3:2]

See ?py_get_item for examples.

The same syntax should work for Polars DataFrames.

@junghoon-son
Copy link

Would love to see this as well!

@tontief
Copy link

tontief commented Sep 17, 2024

what's the status on this? currently, when using polars in quarto with revealjs, rendering is terrible. Is it possible to pre-process everything and use to_pandas without showing that?

@t-kalinowski
Copy link
Member

CC @cderv, do you have any thoughts about ☝🏻 ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants