-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Builtins for numpy recarrays #142
Comments
Looks like the issue is that Is this a problem? Pandas dataframes are better than recarrays in pretty every way...? |
Apparently, it's raising a ValueError to maintain backwards compatibility as explained here. Maybe it's worthwhile to check the dtype before catching the exception. Also, the example is a structured array, not a record array. I think it's worthwhile to add this behaviour to make sure it works on numpy structured/record arrays because a lot of functionality fails otherwise. I raised this issue because the documentation states:
Personally, I was using this in an environment where a (design) choice was made to use record arrays instead of pandas dataframes. |
I'm facing the same issue with this code: >>> import statsmodels.formula.api as smf
>>> import numpy as np
>>> x = np.linspace(0.001, 5, 200)
>>> y = (0.3 * x**3 + 1.2 * x**2 + 70/x**4) * 1.1 * np.exp(0.1)
>>> data = np.array([y, x], dtype=[('y', np.float64), ('x', np.float64)])
>>> model = smf.ols(formula='y ~ I(x**3) + I(x**2) + I(x**4)', data=data)
Traceback (most recent call last):
File "DataScienceVenv/lib/python3.7/site-packages/patsy/compat.py", line 36, in call_and_wrap_exc
return f(*args, **kwargs)
File "DataScienceVenv/lib/python3.7/site-packages/patsy/eval.py", line 166, in eval
+ self._namespaces))
File "<string>", line 1, in <module>
File "DataScienceVenv/lib/python3.7/site-packages/patsy/eval.py", line 48, in __getitem__
return d[key]
File "DataScienceVenv/lib/python3.7/site-packages/patsy/eval.py", line 48, in __getitem__
return d[key]
ValueError: no field of name I
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "DataScienceVenv/lib/python3.7/site-packages/statsmodels/base/model.py", line 170, in from_formula
missing=missing)
File "DataScienceVenv/lib/python3.7/site-packages/statsmodels/formula/formulatools.py", line 67, in handle_formula_data
NA_action=na_action)
File "DataScienceVenv/lib/python3.7/site-packages/patsy/highlevel.py", line 310, in dmatrices
NA_action, return_type)
File "DataScienceVenv/lib/python3.7/site-packages/patsy/highlevel.py", line 165, in _do_highlevel_design
NA_action)
File "DataScienceVenv/lib/python3.7/site-packages/patsy/highlevel.py", line 70, in _try_incr_builders
NA_action)
File "DataScienceVenv/lib/python3.7/site-packages/patsy/build.py", line 696, in design_matrix_builders
NA_action)
File "DataScienceVenv/lib/python3.7/site-packages/patsy/build.py", line 443, in _examine_factor_types
value = factor.eval(factor_states[factor], data)
File "DataScienceVenv/lib/python3.7/site-packages/patsy/eval.py", line 566, in eval
data)
File "DataScienceVenv/lib/python3.7/site-packages/patsy/eval.py", line 551, in _eval
inner_namespace=inner_namespace)
File "DataScienceVenv/lib/python3.7/site-packages/patsy/compat.py", line 43, in call_and_wrap_exc
exec("raise new_exc from e")
File "<string>", line 1, in <module>
patsy.PatsyError: Error evaluating factor: ValueError: no field of name I
y ~ I(x**3) + I(x**2) + I(x**4)
^^^^^^^
>>> The documentation for
Yet in fact, structured arrays don't work, or not all features of the |
Just an an FTI, statsmodels no longer officially supports recarrays. Any references remaining as vestigial and should be removed. |
When trying to use the patsy builtin identity matrix
I()
adding two features the numpy recarray throws an error while the pandas equivalent executes without a problem. Code to reproduce the error:
python 3.6.5
patsy 0.5.0
pandas 0.23.4
numpy 1.14.1
traceback:
The text was updated successfully, but these errors were encountered: