-
Notifications
You must be signed in to change notification settings - Fork 0
/
corr-struct.qmd
171 lines (137 loc) · 5.9 KB
/
corr-struct.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
```{r}
#| label: setup-corstr
#| include: false
library(tidyverse)
knitr::opts_chunk$set(echo = TRUE, fig.width=6.5, fig.height=2.5)
```
# Correlation Structures
This section is an appendix to the GEE chapter, to illustrate and
discuss the different correlation structures that we can specify when
fitting a GEE.
### Variance/Covariance or Correlation?
Statisticians often talk about a model's **variance-covariance** matrix
(which gives residual variance on the diagonal and covariance between
residuals in its off-diagonal elements). Here we will consider
**correlation** matrices instead. This will give us a simplified way of
looking at similar ideas. In a correlation matrix, the diagonal entries
will be all 1s (the correlation of something with itself is 1), and
off-diagonal elements will show correlation between residuals.
### Example Case
Let's consider an example for a dataset with 9 observations; three
observations for each of three individuals. *(Side note: The groupings
causing non-independence in the residuals don't have to be "individuals"
-- it could be some other relationship, like being from the same school
or town or country or ethnic group -- any single categorical variable
that defines the groups of interest and induces non-independence withing
the groups, could work. Here we are just calling them "individuals" for
convenience, and because when time-series data are collected on multiple
individuals, this kind of model often works well.)*
Each of the correlation matrices below has one row and one column for
each observation.
The *diagonal* entries will all, always, be 1.
The *off-diagonal* entries indicate how the different residuals depend
on each other -- in other words, they describe mathematically exactly
*how* each residual is correlated with the others.
We will use the letter $\rho$ for correlations.
### Independence
If the residuals are all independent of each other, then all the
correlations between them will be zero:
$$
\begin{vmatrix}
1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
0 & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 1 & 0 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 0 & 1 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0\\
0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1\\
\end{vmatrix}
$$
This corresponds to all of the models we have considered previously in
class (which all have a condition that residuals should be independent).
### ACF Example
The residual ACF from a model fitted to data that perfectly embody this
structure with 5 observations per individual might look like:
```{r, echo=FALSE}
d <- data.frame(r = as.numeric(arima.sim(model = list(), n=5)))
while(nrow(d) < 100){
a <- data.frame(r = as.numeric(arima.sim(model = list(), n=5)))
d <- bind_rows(d,a)
}
acf(d$r, main='Residual ACF')
```
### Exchangeable = Block Diagonal
In an exchangeable or block diagonal structure, all residuals for one
individual are equally correlated (correlation = $\rho$). All residuals
from different individuals are independent (correlation = 1).
$$
\begin{vmatrix}
1 & \rho & \rho & 0 & 0 & 0 & 0 & 0 & 0\\
\rho & 1 & \rho & 0 & 0 & 0 & 0 & 0 & 0\\
\rho & \rho & 1 & 0 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 1 & \rho &\rho & 0 & 0 & 0\\
0 & 0 & 0 & \rho & 1 & \rho & 0 & 0 & 0\\
0 & 0 & 0 & \rho & \rho & 1 & 0& 0& 0\\
0 & 0 & 0 & 0 & 0 & 0 & 1 & \rho & \rho\\
0 & 0 & 0 & 0 & 0 & 0 & \rho & 1 & \rho\\
0 & 0 & 0 & 0 & 0 & 0 & \rho & \rho & 1\\
\end{vmatrix}
$$
#### ACF Example
The residual ACF from a model fitted to data that perfectly embody this
structure would show non-independence in the residual ACF, out to about
the number of lags that there are observations per individual, but the
autocorrelation coefficients would not decline at exactly the rate
predicted by an AR(1) process (see below).
### AR1 (first-order autoregressive process)
In this structure, we again assume that residuals are uncorrelated
(independent) between individuals.
Within an individual, we assume measurements that are closer to one
another are more correlated. More precisely, we say that observations 1
lag apart have correlation $\rho$; observations 2 lags apart have
correlation $\rho^2$; three lags apart $\rho^3$, and so on.
To illustrate this, I show below a correlation structure for a model
with 10 observations: 5 observations from each of 2 individuals.
$$
\begin{vmatrix}
1 & \rho & \rho^2 & \rho^3 & \rho^4 & 0 & 0 & 0 & 0 & 0\\
\rho & 1 & \rho &\rho^2 & \rho^3 & 0 & 0 & 0 & 0& 0\\
\rho^2 & \rho & 1 & \rho & \rho^2 & 0 & 0 & 0 & 0& 0\\
\rho^3 & \rho^2 & \rho & 1 & \rho & 0 & 0 & 0 & 0& 0\\
\rho^4 & \rho^3 & \rho^2 & \rho & 1 & 0 & 0 & 0 & 0& 0\\
0 & 0 & 0 & 0 & 0 & 1 & \rho & \rho^2 & \rho^3 & \rho^4\\
0 & 0 & 0 & 0 & 0 & \rho & 1 & \rho &\rho^2 & \rho^3\\
0 & 0 & 0 & 0 & 0 & \rho^2 & \rho & 1 & \rho & \rho^2 \\
0 & 0 & 0 & 0 & 0 & \rho^3 & \rho^2 & \rho & 1 & \rho\\
0 & 0 & 0 & 0 & 0 & \rho^4 & \rho^3 & \rho^2 & \rho & 1\\
\end{vmatrix}
$$
#### ACF Example
The residual ACF from a model fitted to data that perfectly embody this
structure with $\rho=0.99$ and 5 observations per individual, might look
like:
```{r, echo=FALSE}
d <- data.frame(r = as.numeric(arima.sim(model = list(ar = 0.99), n=5)))
while(nrow(d) < 100){
a <- data.frame(r = as.numeric(arima.sim(model = list(ar = 0.99), n=5)))
d <- bind_rows(d,a)
}
acf(d$r, main='Residual ACF')
```
### Unstructured
This one is not of practical use because it is so hard to estimate.
$$
\begin{vmatrix}
1 & \rho_{1,2} & \rho_{1,3} & 0 & 0 & 0 & 0 & 0 & 0\\
\rho_{1,2} & 1 & \rho_{2,3} & 0 & 0 & 0 & 0 & 0 & 0\\
\rho_{1,3} & \rho_{2,3} & 1 & 0 & 0 & 0 & 0 & 0 & 0\\
0 & 0 & 0 & 1 & \rho_{1,2} & \rho_{1,3} & 0 & 0 & 0\\
0 & 0 & 0 & \rho_{1,2} & 1 & \rho_{2,3}& 0 & 0 & 0\\
0 & 0 & 0 & \rho_{1,3} & \rho_{2,3} & 1&0 & 0 & 0\\
0 & 0 & 0 & 0 & 0 & 0 & 1 & \rho_{1,2} & \rho_{1,3}\\
0 & 0 & 0 & 0 & 0 & 0 & \rho_{1,2} & 1 & \rho_{2,3}\\
0 & 0 & 0 & 0 & 0 & 0 & \rho_{1,3} & \rho_{2,3} & 1\\
\end{vmatrix}
$$