-
Notifications
You must be signed in to change notification settings - Fork 330
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
py_to_r
refactor (c++)
#1552
py_to_r
refactor (c++)
#1552
Conversation
n R code there is now generally no need to call if(py_is_module_proxy(x)) py_resolve_module_proxy(x) This is handled upstream in PyObjectRef()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM, but it's a large PR so it's challenging to review in depth. Are there more tests we could add to validate that we haven't changed any existing behavior unexpectedly?
src/python.cpp
Outdated
@@ -1084,7 +1319,12 @@ SEXP py_to_r(PyObject* x, bool convert) { | |||
} | |||
|
|||
// dict | |||
else if (PyDict_CheckExact(x)) { | |||
if (PyDict_CheckExact(x)) { | |||
// if you tempted to change this to PyDict_Check() to allow subclasses, don't. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
😆
Can you elaborate on the motivation? What if a user wants to create a Python function from R, that is later passed back into Python? A dummy example:
|
The example works just fine - and should be unchanged from before: library(reticulate)
py_run_string("def apply(f, x): return f(x)")
fn <- py_eval("lambda x: x + 1")
main <- import_main()
main$apply(fn, 2)
#> [1] 3 What is different is what happens when fn <- py_eval("lambda x: x + 1", convert = FALSE)
fn(2) # error in current version, required using py_call()
# now returns a py ref to '3.0' with this PR |
The specific bug that I was initially trying to fix was in tfdatasets: library(tfdatasets)
make_csv_dataset("hearts.csv") |>
dataset_map(\(x) { ... }) Where I was expecting |
Closes #1024 |
Do you have any specific edge-cases in mind? I already expanded some tests related to conversion, and will still add a few more related to logical numpy array conversion. Anything else? |
A quick benchmark: library(reticulate)
bench::mark(np_array_work = {
x <- np_array(array(runif(1000)))
y <- np_array(array(runif(1000)))
z <- x + y
w <- py_to_r(x > .5 & y < .5)
z <- z[w]
py_to_r(z)
}) #> # A tibble: 2 × 12
#> expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
#> <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
#> 1 main_now 1.4ms 1.49ms 652. 435KB 8.25 9875 125
#> 2 refactored 457.8µs 485.67µs 1961. 308KB 13.2 9933 67
#> # ℹ 4 more variables: total_time <bch:tm>, memory <list>, time <list>,
#> # gc <list> |
Nothing specifically, just general trepidation resulting from seeing a large PR :-) Do we already have test coverage for confirming that the removal of these S3 methods doesn't affect anything here? https://github.com/rstudio/reticulate/pull/1552/files#diff-b6b6854a02ae177f8860f49654ff250c1554d021ae434b6efdbe99d00bdc6055L7 |
This PR diff is unfortunately larger than I'd like. It started with trying to fix one bug, but with the way the
py_to_r()
conversion routines were factored, I realized it made sense to implement the fix "upstream", inPyObjectRef
, rather than at each cpp call site ofpy_to_r()
andpy_ref()
. That led me to need to touch code in lots of places (and load it all in my head). As I refactored, I discovered more bugs and, with the code fresh in my mind, I also implemented some straightforward fixes and optimizations. The size of the changes snowballed, and at this point, dividing them into smaller PRs doesn't seem worth the effort.User facing changes:
Callable python objects created with
convert = FALSE
now always get wrapped in a function.(i.e.,
typeof(py_eval("lambda x: x", convert = FALSE)) == "function")
). This now works:py_to_r()
S3 methods now are called on python objects supplied as args to R functions being called from Python if the R function was converted withconvert = TRUE
. This used to error, and now works:py_to_r(x)
no longer signals an error ifx
is not a python object. In that case,x
is returned unmodified.(This is to avoid introducing new errors in user code, if users wrote previous workarounds for S3 generics that should have been getting called, but weren't.)
attr(x, "tzone")
attributes are (better) preserved when converting POSIXt to Python.POSIXt types with a non-empty
tzone
attr will always convert to a datetime.datetime,otherwise they will convert to numpy datetime64[ns] arrays.
Fixed an issue where calling
py_set_item()
on a subclassed dict would not invoke a custom__setitem__
method.py_del_attr(x, name)
now returnsx
invisiblysource_python()
no longer exports assigns ther
symbol(the "R Interface object" that is used by python code get a reference to the R globalenv)
Developer facing changes:
in c++ functions,
r_to_py(RObject, convert)
is now all you need to go from SEXP to PyObject*in c++ functions,
py_to_r(PyObject*, convert)
is now all you need to go from PyObject* to SEXPpy_to_r(PyObject*, convert = false)
will now always return a PyObjectRef.py_to_r(PyObject*, convert = true)
will now invokepy_to_r()
S3 methods, if the object is not a "simple" python object.r_to_py(RObject, convert)
now returns the underlying PyObject* if it is aPyObjectRef
already.The SEXP underlying PyObjectRef can now be either an R environment, or an R closure (if PyObject* is callable). All the common logic for building up a PyObjectRef for presentation in R is now done within the PyObjectRef constructor. This includes building up the closure if the PyObject is callable, and building up the S3
class
attribute.py_to_r_wrapper()
S3 generic will be called on all callable PyObjectRefs(both with convert = TRUE and convert = FALSE)
in R code there is now generally no need to call
if(py_is_module_proxy(x)) py_resolve_module_proxy(x)
or
ensure_python_initialized()
This is handled upstream in
PyObjectRef
or the individual c++ methodsOptimizations:
In some ad-hoc benchmarks simulating some series of operations with python objects, I see ~2x speedup. E.g., this is ~2x faster:
Though, in some minimal benchmarks of a callable created with
convert = FALSE
, there is now a small slowdown due the extra work of building up an R closure (including S3 dispatch ofpy_to_r_wrapper()
). I.e.py_eval("lambda x: x", convert = FALSE)