-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add variances of predictions? #17
Comments
Relevant example (from email), in Stata: * Low Birth Weight
webuse lbw
* The logistic regression
logit low i.smoke i.race age lwt i.smoke ptl ht ui
* The margins proportions among smoke levels
* Note. There is no dy/dx because I want the margina mean (proportion)
margins smoke, post
* Now we use the proportions to compute the ratios
nlcom _b[1.smoke]/_b[0.smoke]
* Store the log-ratio
nlcom (logpr: ln(_b[1.smoke]) - ln(_b[0.smoke])), post
* Use lincom to get back to ratio world
lincom logpr, eform R equivalent: library("prediction")
library("webuse")
webuse("lbw")
m <- glm(low ~ smoke + race + age + lwt + smoke + ptl + ht + ui, data = lbw, family = binomial)
summary(m)
# marginal proportions
prediction(m, at = list(smoke = 0:1)) but need the variances to do an R equivalent of |
@leeper looking forward to this! would be amazing!! |
Here's a simple Stata example: . regress y i.sex age distance
. margins sex
Predictive margins Number of obs = 3,000
Model VCE : OLS
Expression : Linear prediction, predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. t P>|t| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex |
male | 62.73048 .5354111 117.16 0.000 61.68067 63.78029
female | 76.71801 .5346591 143.49 0.000 75.66967 77.76634
------------------------------------------------------------------------------ and the equivalent R code we'll need to generalize: webuse::webuse("margex")
# setup data
Y <- cbind(1, margex[, c("sex", "age", "distance", "y")])
Y$sex <- as.numeric(Y$sex)
Y$age <- as.numeric(Y$age)
Y$distance <- as.numeric(Y$distance)
Y$y <- as.numeric(Y$y)
# estimate model
m <- lm(y ~ factor(sex) + age + distance, data = Y)
# convert data to matrix
Y$y <- NULL
Ymat <- t(matrix(colMeans(Y)))
## standard error of predictive marginal mean (non-sense given sex is factor)
sqrt(diag(Ymat %*% vcov(m) %*% t(Ymat)))
# predictive margin for men
Ymale <- Ymat
Ymale[,2] <- 0
as.vector(Ymale %*% coef(m))
## [1] 62.73048
sqrt(diag(Ymale %*% vcov(m) %*% t(Ymale)))
## [1] 0.5354111
# predictive margin for women
Yfemale <- Ymat
Yfemale[,2] <- 1
as.vector(Yfemale %*% coef(m))
## [1] 76.71801
sqrt(diag(Yfemale %*% vcov(m) %*% t(Yfemale)))
## [1] 0.5346591 |
We can actually obtain this easier than I expected because it's the same as: > predict(m, setNames(data.frame(Ymale), c("intercept", "sex", "age", "distance")), se.fit = TRUE)
$fit
1
62.73048
$se.fit
[1] 0.5354111
$df
[1] 2996
$residual.scale
[1] 20.15342
> predict(m, setNames(data.frame(Yfemale), c("intercept", "sex", "age", "distance")), se.fit = TRUE)
$fit
1
76.71801
$se.fit
[1] 0.5346591
$df
[1] 2996
$residual.scale
[1] 20.15342 Thus: > Ymale_df <- Y
> Ymale_df$sex <- 0
> mean(predict(m, Ymale_df))
[1] 62.73048
> Yfemale_df <- Y
> Yfemale_df$sex <- 1
> mean(predict(m, Yfemale_df))
[1] 76.71801 |
Need to match output of margins: > summary(margins::margins(mod))
factor AME SE z p lower upper
group2 8.9454 0.9710 9.2129 0.0000 7.0423 10.8485
group3 18.5723 1.5032 12.3552 0.0000 15.6261 21.5185
sex1 18.4317 0.9599 19.2011 0.0000 16.5503 20.3132 So add columns:
|
This is now working for > summary(prediction(lm(mpg ~ cyl*am, mtcars), at = list(cyl = 4:8, am = 0:1)))
at(cyl) at(am) Prediction SE z p lower upper
4 0 22.97 1.4840 15.478 4.854e-54 20.06 25.88
5 0 20.99 1.1035 19.026 1.037e-80 18.83 23.16
6 0 19.02 0.7971 23.862 7.665e-126 17.46 20.58
7 0 17.04 0.6748 25.258 9.138e-141 15.72 18.37
8 0 15.07 0.8232 18.304 7.718e-75 13.45 16.68
4 1 27.93 1.0055 27.772 9.410e-170 25.95 29.90
5 1 24.64 0.8163 30.190 3.234e-200 23.04 26.24
6 1 21.36 0.9587 22.284 5.340e-110 19.48 23.24
7 1 18.08 1.3302 13.594 4.324e-42 15.48 20.69
8 1 14.80 1.7936 8.253 1.549e-16 11.29 18.32 But support for other model types will require some further work. |
Stata Benchmarks: . webuse margex
(Artificial data for margins)
. logit outcome i.sex i.group
Iteration 0: log likelihood = -1366.0718
Iteration 1: log likelihood = -1207.4432
Iteration 2: log likelihood = -1186.5543
Iteration 3: log likelihood = -1185.6072
Iteration 4: log likelihood = -1185.6063
Iteration 5: log likelihood = -1185.6063
Logistic regression Number of obs = 3,000
LR chi2(3) = 360.93
Prob > chi2 = 0.0000
Log likelihood = -1185.6063 Pseudo R2 = 0.1321
------------------------------------------------------------------------------
outcome | Coef. Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
sex |
female | .5074091 .1289294 3.94 0.000 .2547122 .760106
|
group |
2 | -1.196351 .1251234 -9.56 0.000 -1.441589 -.9511138
3 | -2.591768 .2759998 -9.39 0.000 -3.132718 -2.050819
|
_cons | -1.199567 .1262273 -9.50 0.000 -1.446968 -.9521664
------------------------------------------------------------------------------
. margins
Predictive margins Number of obs = 3,000
Model VCE : OIM
Expression : Pr(outcome), predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .1696667 .0064564 26.28 0.000 .1570124 .1823209
------------------------------------------------------------------------------
.
. margins, predict(xb)
Predictive margins Number of obs = 3,000
Model VCE : OIM
Expression : Linear prediction (log odds), predict(xb)
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | -1.981424 .0731871 -27.07 0.000 -2.124868 -1.83798
------------------------------------------------------------------------------
. |
Reference for all GLMs: http://indiana.edu/~jslsoc/stata/ci_computations/spost_deltaci.pdf |
Further Stata benchmarks: . quietly logit outcome distance
. margins
Predictive margins Number of obs = 3,000
Model VCE : OIM
Expression : Pr(outcome), predict()
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .1696667 .0068061 24.93 0.000 .156327 .1830063
------------------------------------------------------------------------------
. margins, predict(xb)
Predictive margins Number of obs = 3,000
Model VCE : OIM
Expression : Linear prediction (log odds), predict(xb)
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | -1.706939 .0625907 -27.27 0.000 -1.829614 -1.584263
------------------------------------------------------------------------------
. help margins
. margins, atmeans
Adjusted predictions Number of obs = 3,000
Model VCE : OIM
Expression : Pr(outcome), predict()
at : distance = 58.58566 (mean)
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | .1535612 .0081355 18.88 0.000 .1376158 .1695066
------------------------------------------------------------------------------
. margins, predict(xb) atmeans
Adjusted predictions Number of obs = 3,000
Model VCE : OIM
Expression : Linear prediction (log odds), predict(xb)
at : distance = 58.58566 (mean)
------------------------------------------------------------------------------
| Delta-method
| Margin Std. Err. z P>|z| [95% Conf. Interval]
-------------+----------------------------------------------------------------
_cons | -1.706939 .0625907 -27.27 0.000 -1.829614 -1.584263
------------------------------------------------------------------------------ |
Now that
prediction()
has gained theat
argument, it seems logical to report variances (ala Statamargins::margins()
).The text was updated successfully, but these errors were encountered: