Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sparse data frame conversion fails #2

Open
krassowski opened this issue Aug 3, 2022 · 3 comments
Open

Sparse data frame conversion fails #2

krassowski opened this issue Aug 3, 2022 · 3 comments
Labels
enhancement New feature or request

Comments

@krassowski
Copy link
Member

Is your feature request related to a problem? Please describe.

from scipy import sparse
mat = sparse.eye(3)
df = pd.DataFrame.sparse.from_spmatrix(mat, columns=['A', 'B', 'C'])
%R -i df
pandas2ri.py: Error while trying to convert the column "A". Fall back to string conversion. The error is: 'SparseDtype' object has no attribute 'isnative'
AttributeError                            Traceback (most recent call last)
/lib/python3.9/site-packages/rpy2/robjects/pandas2ri.py:57, in py2rpy_pandasdataframe(obj)
     56 try:
---> 57     od[name] = conversion.py2rpy(values)
     58 except Exception as e:

File ~/.pyenv/versions/3.9.5/lib/python3.9/functools.py:877, in singledispatch.<locals>.wrapper(*args, **kw)
    874     raise TypeError(f'{funcname} requires at least '
    875                     '1 positional argument')
--> 877 return dispatch(args[0].__class__)(*args, **kw)

/lib/python3.9/site-packages/rpy2/robjects/pandas2ri.py:191, in py2rpy_pandasseries(obj)
    189 # current conversion as performed by numpy
--> 191 res = func(obj.values)
    192 if len(obj.shape) == 1:

/lib/python3.9/site-packages/rpy2/robjects/numpy2ri.py:84, in numpy2rpy(o)
     82 """ Augmented conversion function, converting numpy arrays into
     83 rpy2.rinterface-level R structures. """
---> 84 if not o.dtype.isnative:
     85     raise ValueError('Cannot pass numpy arrays with non-native '
     86                      'byte orders at the moment.')

AttributeError: 'SparseDtype' object has no attribute 'isnative'

During handling of the above exception, another exception occurred:

TypeError                                 Traceback (most recent call last)
/lib/python3.9/site-packages/rpy2/rinterface_lib/sexp.py:610, in SexpVector.from_object(cls, obj)
    609 try:
--> 610     mv = memoryview(obj)
    611     res = cls.from_memoryview(mv)

TypeError: memoryview: a bytes-like object is required, not 'Series'

During handling of the above exception, another exception occurred:

AttributeError                            Traceback (most recent call last)
Input In [280], in <cell line: 4>()
      2 mat = sparse.eye(3)
      3 df = pd.DataFrame.sparse.from_spmatrix(mat, columns=['A', 'B', 'C'])
----> 4 get_ipython().run_line_magic('R', '-i df')

/lib/python3.9/site-packages/IPython/core/interactiveshell.py:2305, in InteractiveShell.run_line_magic(self, magic_name, line, _stack_depth)
   2303     kwargs['local_ns'] = self.get_local_scope(stack_depth)
   2304 with self.builtin_trap:
-> 2305     result = fn(*args, **kwargs)
   2306 return result

/lib/python3.9/site-packages/rpy2/ipython/rmagic.py:737, in RMagics.R(self, line, cell, local_ns)
    735                 raise NameError("name '%s' is not defined" % input)
    736         with localconverter(converter) as cv:
--> 737             ro.r.assign(input, val)
    739 if args.display:
    740     try:

/lib/python3.9/site-packages/rpy2/robjects/functions.py:198, in SignatureTranslatedFunction.__call__(self, *args, **kwargs)
    196         v = kwargs.pop(k)
    197         kwargs[r_k] = v
--> 198 return (super(SignatureTranslatedFunction, self)
    199         .__call__(*args, **kwargs))

/lib/python3.9/site-packages/rpy2/robjects/functions.py:117, in Function.__call__(self, *args, **kwargs)
    116 def __call__(self, *args, **kwargs):
--> 117     new_args = [conversion.py2rpy(a) for a in args]
    118     new_kwargs = {}
    119     for k, v in kwargs.items():
    120         # TODO: shouldn't this be handled by the conversion itself ?

/lib/python3.9/site-packages/rpy2/robjects/functions.py:117, in <listcomp>(.0)
    116 def __call__(self, *args, **kwargs):
--> 117     new_args = [conversion.py2rpy(a) for a in args]
    118     new_kwargs = {}
    119     for k, v in kwargs.items():
    120         # TODO: shouldn't this be handled by the conversion itself ?

File ~/.pyenv/versions/3.9.5/lib/python3.9/functools.py:877, in singledispatch.<locals>.wrapper(*args, **kw)
    873 if not args:
    874     raise TypeError(f'{funcname} requires at least '
    875                     '1 positional argument')
--> 877 return dispatch(args[0].__class__)(*args, **kw)

/lib/python3.9/site-packages/rpy2/robjects/pandas2ri.py:63, in py2rpy_pandasdataframe(obj)
     58     except Exception as e:
     59         warnings.warn('Error while trying to convert '
     60                       'the column "%s". Fall back to string conversion. '
     61                       'The error is: %s'
     62                       % (name, str(e)))
---> 63         od[name] = StrVector(values)
     65 return DataFrame(od)

/lib/python3.9/site-packages/rpy2/robjects/vectors.py:385, in StrVector.__init__(self, obj)
    384 def __init__(self, obj):
--> 385     super().__init__(obj)
    386     self._add_rops()

/lib/python3.9/site-packages/rpy2/rinterface_lib/sexp.py:523, in SexpVector.__init__(self, obj)
    521     super().__init__(obj)
    522 elif isinstance(obj, collections.abc.Sized):
--> 523     super().__init__(self.from_object(obj).__sexp__)
    524 else:
    525     raise TypeError('The constructor must be called '
    526                     'with an instance of '
    527                     'rpy2.rinterface.Sexp '
    528                     'or an instance of '
    529                     'rpy2.rinterface._rinterface.SexpCapsule')

/lib/python3.9/site-packages/rpy2/rinterface_lib/sexp.py:614, in SexpVector.from_object(cls, obj)
    612 except (TypeError, ValueError):
    613     try:
--> 614         res = cls.from_iterable(obj)
    615     except ValueError:
    616         msg = ('The class methods from_memoryview() and '
    617                'from_iterable() both failed to make a {} '
    618                'from an object of class {}'
    619                .format(cls, type(obj)))

/lib/python3.9/site-packages/rpy2/rinterface_lib/conversion.py:45, in _cdata_res_to_rinterface.<locals>._(*args, **kwargs)
     44 def _(*args, **kwargs):
---> 45     cdata = function(*args, **kwargs)
     46     # TODO: test cdata is of the expected CType
     47     return _cdata_to_rinterface(cdata)

/lib/python3.9/site-packages/rpy2/rinterface_lib/sexp.py:552, in SexpVector.from_iterable(cls, iterable, populate_func, set_elt, cast_value)
    547 with memorymanagement.rmemory() as rmemory:
    548     r_vector = rmemory.protect(
    549         openrlib.rlib.Rf_allocVector(
    550             cls._R_TYPE, n)
    551     )
--> 552     populate_func(iterable, r_vector, set_elt, cast_value)
    553 return r_vector

/lib/python3.9/site-packages/rpy2/rinterface_lib/sexp.py:474, in _populate_r_vector(iterable, r_vector, set_elt, cast_value)
    472 def _populate_r_vector(iterable, r_vector, set_elt, cast_value):
    473     for i, v in enumerate(iterable):
--> 474         set_elt(r_vector, i, cast_value(v))

/lib/python3.9/site-packages/rpy2/rinterface_lib/sexp.py:677, in _as_charsxp_cdata(x)
    675     return x.__sexp__._cdata
    676 else:
--> 677     return conversion._str_to_charsxp(x)

/lib/python3.9/site-packages/rpy2/rinterface_lib/conversion.py:142, in _str_to_charsxp(val)
    140     s = rlib.R_NaString
    141 else:
--> 142     cchar = _str_to_cchar(val, encoding='utf-8')
    143     s = rlib.Rf_mkCharCE(cchar, openrlib.rlib.CE_UTF8)
    144 return s

/lib/python3.9/site-packages/rpy2/rinterface_lib/conversion.py:121, in _str_to_cchar(s, encoding)
    119 def _str_to_cchar(s: str, encoding: str = 'utf-8'):
    120     # TODO: use isString and installTrChar
--> 121     b = s.encode(encoding)
    122     return ffi.new('char[]', b)

AttributeError: 'numpy.float64' object has no attribute 'encode'

Describe the solution you'd like

Support for converting sparse data frames in rpy2. I know there is https://github.com/rpy2/rpy2-Matrix but it does not cover data frames (and is not published).

Describe alternatives you've considered
Having another package handle this would not be nice for interactive usage.

Additional context
None

@lgautier
Copy link
Member

lgautier commented Aug 6, 2022

Sparse data array handling is not part of R's standard library. The Matrix package does handle them though.

The most natural way to address this ticket seems to be:

  • add a converter to pandas data frames to rpy2-Matrix
  • release a snapshot of that package and upload to pypi.

What do you think? If the case I'd transfer this issue to the rpy2-Matrix repository.

@lgautier lgautier added the enhancement New feature or request label Aug 6, 2022
@krassowski
Copy link
Member Author

Sounds good.

@lgautier lgautier transferred this issue from rpy2/rpy2 Aug 6, 2022
@lgautier
Copy link
Member

lgautier commented Aug 7, 2022

I have updated the code to match the current latest release of the R package Matrix. I'll let you try the viability of a converter.

Two notes:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants