-
Notifications
You must be signed in to change notification settings - Fork 0
/
cluster-robust-variances.Rmd
84 lines (57 loc) · 3.74 KB
/
cluster-robust-variances.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
# Cluster Robust Variances
As mentioned, if we ignore the clustering, the so-called fixed effects estimates (i.e. coefficients) are correct, but their variances are not. We can get proper estimates of the standard errors via <span class="emph">cluster robust standard errors</span>, which are very popular in econometrics and fields trained in that fashion, but not widely used elsewhere in my experience. Essentially, these allow one to fire-and-forget, and treat the clustering as more of a nuisance.
In programs like Stata, obtaining these are basically an option for most modeling procedures. In R, it's not quite as straightforward, but not difficult. There are packages such as <span class="pack">sandwich</span> that can provide heteroscedastic robust standard errors, but won't necessarily take into account clustering. I provide a custom function that will work in this example so that the curtain can be pulled back a little, but the <span class="pack">plm</span> package would be the way to go for cluster robust standard errors. The <span class="pack">plm</span> package can take into account the serial autocorrelation via the 'arellano' input to the `type` argument.
```{r plmMod}
library(plm)
# create numeric time so plm won't treat as a factor, time is a reserved name in plm
t2 = d$time
clusterrob_mod <- plm(y ~ t2 + treatment, data=d, model='pooling', index='id')
summary(clusterrob_mod)
```
The following compares the different robust SEs we could produce at this point.
<div class="fold s">
```{r clusterRobSe}
resids = resid(lm_mod) # residuals
xvar = solve(crossprod(lm_mod$x)) # (X'X)^-1
gd0 = data.frame(id=d$id, r=resids, lm_mod$x)
# custom function to be applied within cluster
cluscom = function(mat){
X = as.matrix(mat)[, 3:5] # magic numbers represent int, time and treatment columns
crossprod(X, tcrossprod(mat$r)) %*% X
}
# calculate within cluster variance
gd = gd0 %>%
group_by(id) %>%
do(xvar = cluscom(.))
# plm
plm_se = vcovHC(clusterrob_mod, method='arellano', cluster='group') %>% diag %>% sqrt %>% round(3)
# custom output
custom_se = (xvar %*% Reduce(`+`, gd$xvar) %*% xvar) %>% diag %>% sqrt %>% round(3)
# original lm
lm_se = vcov(lm_mod) %>% diag %>% sqrt %>% round(3)
# non-clustered 'robust' se
lm_robse = vcovHC(lm_mod, type='HC0') %>% diag %>% sqrt %>% round(3)
```
</div>
```{r seComparison}
# plm
plm_se
# custom output
custom_se
# original lm
lm_se
# non-clustered 'robust' se
lm_robse
```
## Pros
- Don't have to think too much about the issues at play
- Fewer assumptions about the model
- A general heteroscedastic 'robust' approach
## Cons
- Ignores cluster-specific effects
- By default, assumes no intracluster correlation
- Few tools go beyond simple clustering, requiring manual approach
- Not as general as GEE approach
- Suggests that there are problems in model specification that the method itself does not correct
Gist: if your goal is to simply do a standard GLM approach and essentially ignore the clustering, treating it more or less as a nuisance to overcome, and your grouping structure is simple, then this may be for you. However, note that the [GEE][Generalized Estimating Equations] approach as a possibly more flexible alternative that can still incorporate cluster robust standard errors[^clusrob_specialcase]. Furthermore, differences between the robust and regular SE may suggest a model misspecification issue. 'Fixing' standard errors won't solve that problem.
[^clusrob_specialcase]: The cluster robust standard errors as depicted above assume independent error structure in the GEE model, and thus are a special case of GEE. When you peruse the GEE model, rerun it with the argument `corst='ind'` to duplicate the plm and custom results above.