-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clustered SE considerations in feols #6
Comments
Thanks for the pointer. Comparability across software is indeed important. I'll include more options in SE clustering. Then, I'll write a vignette explaining exactly how the clustered SEs are computed in the package, detailing the different options. The problem is that the theory behind se-clustering is not compelling (as mentioned by Sergio). Even nesting is not so clear. Let me use the example you introduced in the thread you referenced. Let's say you cluster the SEs on a dimension that nests only half the data (e.g. you estimate employee FEs and cluster at the company level, leading to partial nesting): # var. 'cl' (ranging from 1 to 50) did nest the var. 'fe' (from 1 to 500)
# now 'cl_half' nests only half the sample (random for the rest)
cl_half = cl
cl_half[cl < 25] = 100 + sample(1:30, sum(cl < 25), TRUE) There, as nesting is not complete, it will pass undetected, while if it were detected it would lead to a reduction in the number of parameters K from 501 to about 251 [and a subsequent decrease in the value of the SE]. Or should K be even reduced to 1 when there is partial nesting (that is: equal to the clustered within estimator)??? Another issue is inverse nesting. If instead of estimating: My point is that there are still open questions. But I fully agree different software should be able to provide identical results. I'll work on that! |
@lrberge I need to think a bit more about about your first example (partial nesting). But I'm not sure I see a problem with the second (inverse nesting). We care about FEs being nested within the cluster variables because the former are swept out of the regression prior to estimation. (Unless you're doing something like LSDV estimation which doesn't sweep out the FEs, but that's obviously very inefficient.) So they're not automatically "present" when we make the necessary DoF cluster adjustment. I don't really see how this should have implications going the other way around. Aren't we identifying off of a system where all possible variation in (and within) this hierarchical structure has already been removed? I mean, that loss of information is the standard price we pay for quick estimation of panel models using Frisch-Waugh, etc. (Practical example: Do we care about the strong within household correlations between individual (e.g. siblings) when estimating a model at the household level? I don't think we do. Of course, if you have data on individuals then you might as well use this instead of the (collinear) household FEs, but then you're estimating a different model.) Regardless of what you decide, I strongly agree with you that the clustering of standard errors is much less of a precise science than people like to pretend! A vignette detailing some options for reproducing the results from other SW would be a welcome addition. Let me know if you'd like any help. PS — Since we're on the subject... If I may, the current documentation (in the introduction for example) is a little confusing because you sometimes refer to groups of fixed effects as "clusters". I understand why you do this from a conceptual standpoint, but it might throw new users off. |
Thanks for weighing in! Very healthy discussion! :-) I don't really understand your point. But let me try to make mine clearer. The main difference between SEs is driven by how we compute K, the number of parameters. So far in K = number of variables + number of Fixed-Effects coefficients That is, in @karldw's example, we use K = 501 (1 variable and 500 FE coefs). This is true for the three functions. Now we differ on how to handle nestedness when clustering the standard errors. While I'm perfectly fine with this way of doing, what makes me uncomfortable is the sharp, discontinuous decline in K. Let me take an example to make it clear: # cl is the variable nesting the FEs
cl_bis = cl
# I change just one obs.
cl_bis[1] = 51 What I have done is creating a variable that nests 499 of the fixed-effects but one. In all logic, we would expect to get similar results when clustering by
Due to the behavior regarding nestedness, there is a big discrepancy (-11%!) between the two standard errors in This discontinuity is a call for theory. I would be much more comfortable with something kicking one FE coefficient from K for each nested coefficient, that way we would have no such discrepancy. But again: theory needed. This is what I had in mind when talking about partial nesting. |
In any case, I'll also implement stg to take into account the nestedness, my only point was to highlight that this is not so clear after all. I welcome any suggestion. By the way, regarding your post scriptum, you're right: in my mind I use the words cluster and fixed-effects interchangeably, and this trickles down to the description I wrote. |
Some quick and dirty Monte Carlo simulation:
I estimate the following: y_it = fe_i + x_it + eps_it Since Define k = 1 (the number of variables) and G = the number of fixed-effects. Here are the results when using different values for K (either K = k, either K = k + G). It provides the number of times the null hypothesis is rejected for Clearly it shows that when not clustering the SEs, we should use K = k + G. |
This is an interesting discussion! Do you have any insights on which degrees of freedom to use for the robust F test in case of multi-way clustering? (See also the question on Cross Validated: https://stats.stackexchange.com/questions/229678/double-clustered-standard-errors-and-degrees-of-freedom-in-wald-style-f-test-for) |
Sorry @Helix123, I have no clue! Maybe you could just check by running some Monte Carlo experiments? |
I have updated the package to account for different types of standard-errors. Now the nested way is the default. It will avoid confusion when comparing with Stata output. I have implemented a new function |
Hi, just to say that I have finally settled on the way to compute the SEs in I detailed how they are computed in this vignette: On standard-errors. Feel free to give me any comment, I'll modify the document accordingly. Thanks again for the discussion! |
This is really useful! I appreciate "You’re already fed up about about these details? I’m sorry but there’s more". I'm not sure how deep you want to go on how to choose among the different options. As you say, you "don’t discuss the why, but only the how." That said, it could be useful to have a couple of pointers for folks that want to learn more about making the right choice for their application. The references at the end are great, but it's unclear which paper addresses each choice in the vignette. |
Super work, Laurent. I sympathize strongly with your "complicated journey", having easily lost a week+ of my life to these issues before! The one thing I'll say about lfe is that adding the
It's also worth noting that I've also seen xtreg and reghdfe differ on occasion... Having said all that:
|
Again at the risk of overkill, but perhaps appropriate for an acknowledgements or suggested further reading section at the end: I'm not sure if you've seen the new(ish) sandwich vignette on clustered covariances? It's very thorough. |
Regarding @karldw: as you could feel: I... was the one who was fed up about the details! @grantmcdermott: "I've also seen xtreg and reghdfe differ on occasion" God, please don't tell me things like that! ;-) I think you're right, providing some pointers is important, and the sandwich vignette (forthcoming in JSS btw) is a fantastic reference (that I didn't know of, thanks for the ref!). Btw the package is finally compatible with sandwich, so even more vcov possibilities! youhou! |
@lrberge I've just been reading through the vignette again and realised that newcomers to fixest might be confused by the fact the arguments you're referring to at the top — "se" and "dof" — apply to the methods ( I know that you provide examples further below, but I really think something right out of the gate would help orientate readers. It could be something as simple as adding a one-liner under the appropriate headings. For example, under "The argument
And similarly for "The argument Just a thought! |
Oh, another thing. Do you think there's any value in showing readers how to extract, say, the degrees of freedom from an existing model? Might be useful to let them know where the numbers are coming from, e.g. attr(vcov(gravity), "dof.K") |
You're right indeed, I do as if users were already acquainted with the package, but it's a very wrong assumption! Yep, looks like a good idea to add the Thanks for the suggestions! |
comenta pruebas con errores + CI estandar
The details of clustering and degrees-of-freedom corrections are perennially issues, and challenging for all the issues Sergio pointed out in this thread.
When I run @grantmcdermott's example from that same discussion,
feols
gives the same results aslfe::felm
or Stata'scgmreg
, but different than Stata'sreghdfe
or Grant's proposedfelm(..., cmethod="reghdfe")
.I'm not sure you want to do anything differently, but it's helpful to document the differences across packages.
Ref: DeclareDesign/estimatr#321, sgaure/lfe#19, FixedEffects/FixedEffectModels.jl#49
The text was updated successfully, but these errors were encountered: