-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deferring Evaluation of Terms #149
Comments
patsy has an extensible parser – see Both parts are extensible in principle – being able to support use cases like yours was included in the original design. The parser currently uses the shunting-yard algorithm which is specialized for easily parsing infix arithmetic with precedence and parentheses, so it's easiest to extend by adding new operators, but it could potentially be extended to handle other things like your special But... none of this has ever actually been used, so it probably doesn't quite work. And unfortunately patsy has been essentially unmaintained for years now, so I'm not sure how you'd get there from here. If you have cash to burn I guess you could hire me... |
Thanks. I'll see if I can figure out anything. Assuming I can't (which is reasonable), I can always use a half-way mixed interface with formula for the dense part and a list for the sparse. |
I have written a function called
AbsorbingLS
that can absorb a large number (millions) of categorical variables or categorical interactions. It is implemented using a Frisch-Waugh-Lovell step where the categoricals are handled using scipy sparse matrices. I would like to add a formula interface. Suppose I have a functionA()
that indicates that a variable should be absorbed, is there any place to intervene in the formula parsing for a formula that looks likey ~ 1 + x + A(cat) + A(cat*x)
?I can use another syntax. In an instrumental variable regression I use the syntax
y ~ 1 + x1 + x2 + [x3 ~ z1 + z2]
which is used to determine the configuration of the 2 required regressions. This works fine since it is easy to parse the[]
and then it is a couple of standard calls. This approach doesn't obviously work here since I must avoid creating any arrays. I could use a similar structure here, so something likey ~ 1 + x + {cat +cat*x}
({}
for simplicity in parsing) which I would then need to find a good way to turncat +cat*x
into usable terms (w/o populating dense arrays).Any suggestions on how I could write a formula where I could intercept it using something like the pseudocode
Any suggestions are appreciated.
The text was updated successfully, but these errors were encountered: