(To be released as 0.7.0)
broom 0.7.0
is a major release with a large number of breaking changes. Most of these breaking changes are to improving maintainability and internal consistency, with have posed long-standing difficulties.
This release features a number of unannounced hard-deprecations. I am sorry that I did not have the time to ease these transitions, and am actively looking for assistance maintaining broom
.
-
We have changed how we report degrees of freedom for
lm
objects (#212, #273). This is especially important for instructors in statistics course. Previously thedf
column inglance.lm()
reported the rank of the design matrix. Now it reports degrees of freedom of the numerator for the overall F-statistic. This is equal to the rank of the model matrix minus one (unless you omit an intercept column), so the newdf
should be the olddf
minus one. -
tidy()
no longer checks for a log or logit link whenexponentiate = TRUE
, and we have refactored to remove extraneousexponentiate
arguments. If you setexponentiate = TRUE
, we assume you know what you are doing and that you want exponentiated coefficients (and confidence intervals ifconf.int = TRUE
) regardless of link function. -
We have simplified
glance.aov()
, which now contains only the following columns:logLik
,AIC
,BIC, deviance
,df.residual
,nobs
. This is in response to (#212). Note thattidy.aov()
gives more complete information about degrees of freedom in anaov
object. -
We are moving away from supporting
summary.*()
objects. In particular, we have removedtidy.summary.lm()
as part of a major overhaul of internals. Instead of callingtidy()
onsummary
-like objects, please calltidy()
directly on model objects moving forward. -
We have removed all support for the
quick
argument intidy()
methods. This is to simplify internal and is for maintainability purposes. We anticipate this will not influence many users as few people seemed to use it. If this majorly cramps your style, let us know, as we are considering a new verb to return only model parameters. In the meantime,stats::coef()
together withtibble::enframe()
provides most of the functionality oftidy(..., quick = TRUE)
. -
All
conf.int
arguments now default toFALSE
, and allconf.level
arguments now default to0.95
. This should primarily affecttidy.survreg()
, which previously always returned confidence intervals, although there are some others. -
Tidiers for
emmeans
-objects use the argumentsconf.int
andconf.level
instead of relying on the argument names native to theemmeans::summary()
-methods (i.e.,infer
andlevel
). Similarly,multcomp
-tidiers now include a call tosummary()
as previous behavior was akin to setting the now removed argumentquick = TRUE
. Both families of tidiers now use theadj.p.value
column name when appropriate. Finally,emmeans
-,multcomp
-, andTukeyHSD
-tidiers now consistently use the column namescontrast
andnull.value
instead ofcomparison
,level1
andlevel2
, orlhs
andrhs
(see #692).
This release of broom
hard-deprecates the following functions and tidiers:
- Data frame, rowwise data frame, vector and matrix tidiers have been removed from
broom
bootstrap()
confint_tidy()
glance.summary.lm()
augment.glmRob()
tidy.table()
andtidy.ftable()
have been deprecated in favor oftibble::as_tibble()
tidy.summaryDefault()
andglance.summaryDefault()
have been deprecated in favor ofskimr::skim()
We regret that we were unable to provide warnings for some of these changes.
Mixed models: we have also gone forward with our planned mixed model deprecations, and have removed the following methods, which now live in broom.mixed
:
tidy.brmsfit()
tidy.merMod()
,glance.merMod()
,augment.merMod()
tidyMCMC()
,tidy.rjags()
,tidy.stanfit()
tidy.lme()
,glance.lme()
,augment.lme()
tidy.stanreg()
,glance.stanreg()
-
augment.factanal()
now returns a tibble with columns names.fs1
,.fs2
, ..., instead offactor1
,factor2
, ... (#650). -
We have renamed the output of
augment.htest()
. In particular, we have renamed the.residuals
column to.resid
and the.stdres
to.std.resid
for consistency. These changes will only affect chi-squared tests. -
tidy.ridgelm()
now always return aGCV
column and never returns anxm
column (#532) -
tidy.dist()
no longer supports theupper
argument
- Added
data
argument toaugment()
generic (did this happen?)
have overhauled augment()
for general consistency improvements (hopefully,
pending getting safepredict()
going urgh)
-
If you pass a dataset to
augment()
via thedata
ornewdata
arguments, you are now guaranteed that the augmented dataset will have exactly the same number of rows as the original dataset. This differs from previous behavior primarily when there are missing values. Previouslyaugment()
would drop rows containingNA
. This should no longer be the case. -
augment()
no longer accepts anna.action
argument -
We no longer cram everything through
augment.lm()
and it has subsequently losts a lot of arguments that were needed when it was a frankenstein do everything function -
augment()
tries to give an informative error whendata
isn't the original training data
-
Added new vignette detailing use of
modelgenerics
andmodeltests
packages -
Moved core tests to the
modeltests
package
-
Many
glance()
methods now return anobs
column, which contains the number of data points used to fit the model! (#597 by @vincentarelbundock) -
We now use
rlang::arg_match()
when possible instead ofarg.match()
to give more informative errors on argument mismatches.
-
Add option to
lfe::felm
for robust and cluster standard errors (#772) -
Added tidier for
car::Anova
(#754) -
Tidy methods for
car::Anova
(#754) -
Added
tidy()
andglance()
methods forspeedglm
objects from thespeedglm
package -
Added tidier for
summary.manova
(#729) - TODO // remove assummary.*
is forbidden -
Added tidier for
epiR::epi.2by2
(#711) -
Added tidiers for
rma
objects from themetafor
package (#674, @malcolmbarrett, @softloud) -
Added tidiers for
pam
objects from thecluster
package. (#637) -
Added
tidy.svyglm()
andglance.svyglm()
(#611) -
Added
tidy.garch()
now supports confidence intervals (#568 by @petr102030) -
Added
tidy.regsubsets()
for best subsets linear regression from theleaps
package -
Added method
tidy.lm.beta()
to tidylm.beta
class models (#545 by @mattle24) -
Added tidiers for
lmrob
andglmrob
objects from therobustbase
package (#205, #505). -
Added method
tidy.systemfit()
to tidysystemfit
class models (by @jaspercooper) -
Added tidiers for
lmrob
andglmrob
objects from therobustbase
package (#205, #505). -
Added tidiers for
drc::drm
models (#574 by @edild) -
Added
tidy.summary_emm()
(#691 by @crsh)
-
tidy.felm()
now has arobust = TRUE/FALSE
option that supports robust and cluster standard errors (#772) -
Make
.fitted
values respecttype.predict
argument ofaugment.clm()
. (#617) -
Return factor rather than numeric class predictions in
.fitted
ofaugment.polr()
. (#619) -
tidy.kmeans()
now uses the names of the input variables in the output by default. Setcol.names = NULL
to recover the old behavior. -
Previously, F-statistics for weak instruments were returned through
glance.ivreg()
. F-statistics are now returned throughtidy.ivreg(instruments = TRUE)
. Default istidy.ivreg(instruments = FALSE)
.glance.ivreg()
still returns Wu-Hausman and Sargan test statistics. -
glance.biglm()
now returns adf.residual
column -
tidy.prcomp()
parametermatrix
gained new options"scores"
,"loadings"
, and"eigenvalues"
(#557 by @GegznaV) -
tidy_optim()
now returns the standard error provides the standard error if the Hessian is present. (#529 by @billdenney) (TODO: think about this) -
tidy.htest()
column names are now run throughmake.names()
to ensure syntactic correctness (#549 by @karissawhiting) (TODO: use tidyverse name repair?) -
tidy.lmodel2()
now returns ap.value
column (#570) -
tidy.lsmobj()
gained aconf.int
argument for consistency with other tidiers. -
tidy.zoo()
now doesn't change column names that have spaces or other special characters (previously they were converted to data.frame friendly column names bymake.names
)
- Bug fix to return confidence intervals correct in tidy.drc() (#798)
- Bug fix to better allow
tidy.boot()
to support confidence intervals (#581) - Bug fix to allow
augment.kmeans()
to work with masked data (#609) - Bug fix to allow
augment.Mclust()
to work on univariate data (#490) - Bug fix to allow
tidy.htest()
to supports equal variances (#608) - Bug fix for
tidy.polr()
when passedconf.int = TRUE
(#498) - Bug fix in
glance.lavaan()
(#577)
- Fix failing CRAN checks to due
tibble 3.0.0
release. Removedxergm
dependency.
- Remove tidiers for robust package and drop robust dependency (temporarily)
- Fixes failing CRAN checks as the joineRML package has been removed from CRAN
- Fixes failing CRAN checks due to new matrix classing in R 4.0.0
-
Fixes failing CRAN checks
-
Changes to accomodate ergm 3.10 release.
tidy.ergm()
no longer has aquick
argument. The old default ofquick = FALSE
is now the only option.
tidy()
,glance()
andaugment()
are now re-exported from the generics package.
Tidiers now return tibble::tibble()
s. This release also includes several new
tidiers, new vignettes and a large number of bugfixes. We've also begun to more
rigorously define tidier specifications: we've laid part of the groundwork for
stricter and more consistent tidying, but the new tidier specifications are not
yet complete. These will appear in the next release.
Additionally, users should note that we are in the process of migrating tidying
methods for mixed models and Bayesian models to broom.mixed
. broom.mixed
is
not on CRAN yet, but all mixed model and Bayesian tidiers will be deprecated
once broom.mixed
is on CRAN. No further development of mixed model tidiers
will take place in broom
.
Almost all tidiers should now return tibble
s rather than data.frame
s.
Deprecated tidying methods, Bayesian and mixed model tidiers still return
data.frame
s.
Users are mostly to experience issues when using augment
in situations
where tibbles are stricter than data frames. For example, specifying model
covariates as a matrix object will now error:
library(broom)
library(quantreg)
fit <- rq(stack.loss ~ stack.x, tau = .5)
broom::augment(fit)
#> Error: Column `stack.x` must be a 1d atomic vector or a list
This is because the default data
argument data = model.frame(fit)
cannot be
coerced to tibble
.
Another consequence of this is that augment.survreg
and augment.coxph
from
the survival
package now require that the user explicitly passes data to
either the data
or newdata
arguments.
These restrictions will be relaxed in an upcoming release of broom
pending
support for matrix-columns in tibbles.
Developers are likely to experience issues:
-
subsetting tibbles with
[
, which returns a tibble rather than a vector. -
setting rownames on tibbles, which is deprecated.
-
using matrix and vector tidiers, now deprecated.
-
handling the additional tibble classes
tbl_df
andtbl
beyond thedata.frame
class -
linking to defunct documentation files -- broom recently moved all tidiers to a
roxygen2
template based documentation system.
This version of broom
includes several new vignettes:
-
vignette("available-methods", package = "broom")
contains a table detailing which tidying methods are available -
vignette("adding-tidiers", package = "broom")
is an in-progress guide for contributors on how to add new tidiers to broom -
vignette("glossary", package = "broom")
contains tables describing acceptable argument names and column names for the in-progress new specification.
Several old vignettes have also been updated:
vignette("bootstrapping", package = "broom")
now relies on thersample
package and atidyr::nest
-purrr::map
-tidyr::unnest
workflow. This is now the recommended workflow for working with multiple models, as opposed to the olddplyr::rowwise
-dplyr::do
based workflow.
-
Matrix and vector tidiers have been deprecated in favor of
tibble::as_tibble
andtibble::enframe
-
Dataframe tidiers and rowwise dataframe tidiers have been deprecated
-
bootstrap()
has been deprecated in favor of thersample
-
inflate
has been removed frombroom
-
The
alpha
argument has been removed fromquantreg
tidy methods -
The
separate.levels
argument has been removed fromtidy.TukeyHSD
. To obtain the effect ofseparate.levels = TRUE
, users maytidyr::separate
after tidying. This is consistent with themultcomp
tidier behavior. -
The
fe.error
argument was removed fromtidy.felm
. When fixed effects are tidier, their standard errors are now always included. -
The
diag
argument intidy.dist
has been renameddiagonal
-
Advice to help beginners make PRs (#397 by @karldw)
-
glance
support forarima
objects fit withmethod = "CSS"
(#396 by @josue-rodriguez) -
A bug fix to re-enable tidying
glmnet
objects withfamily = multinomial
(#395 by @erleholgersen) -
A bug fix to allow tidying
quantreg
intercept only models (#378 by @erleholgersen) -
A bug fix for
aovlist
objects (#377 by @mvevans89) -
Support for
glmnetUtils
objects (#352 by @Hong-Revo) -
A bug fix to allow
tidy_emmeans
to handle column names with dashes (#351 by @bmannakee) -
augment.felm
no longer returns.fe_
and.comp
columns -
Support saved formulas in
augment.felm
(#347 by @ShreyasSingh) -
confint_tidy
now drops rows of allNA
(#345 by @atyre2) -
A new tidier for
caret::confusionMatrix
objects (#344 by @mkuehn10) -
Tidiers for
Kendall::Kendall
objects (#343 by @cimentadaj) -
A new tidying method for
car::durbinWatsonTest
objects (#341 by @mkuehn10) -
glance
throws an informative error forquantreg:rq
models fit with multipletau
values (#338 by @bfgray3) -
tidy.glmnet
gains the ability to retain zero-valued coefficients with areturn_zeros
argument that defaults toFALSE
(#337 by @bfgray3) -
tidy.manova
now retains aResiduals
row (#334 by @jarvisc1) -
Tidiers for
ordinal::clm
,ordinal::clmm
,survey::svyolr
andMASS::polr
ordinal model objects (#332 by @larmarange) -
Support for
anova
objects fromcar::Anova
(#325 by @mariusbarth) -
Tidiers for
tseries::garch
models (#323 by @wilsonfreitas) -
Removed dependency on
psych
package (#313 by @nutterb) -
Improved error messages (#303 by @michaelweylandt)
-
Compatibility with new
rstanarm
andloo
packages (#298 by @jgabry) -
Support for tidying lists return by
irlba::irlba
-
A truly huge increase in unit tests (#267 by @dchiu911)
-
Bug fix for
tidy.prcomp
when missing labels (#265 by @corybrunson) -
Added a
pkgdown
site at https://broom.tidyverse.org/ (#260 by @jayhesselberth) -
Added tidiers for
AER::ivreg
models (#247 by @hughjonesd) -
Added tidiers for the
lavaan
package (#233 by @puterleat) -
Added
conf.int
argument totidy.coxph
(#220 by @larmarange) -
Added
augment
method for chi-squared tests (#138 by @larmarange) -
changed default se.type for
tidy.rq
to match that ofquantreg::summary.rq()
(#404 by @ethchr) -
Added argument
quick
fortidy.plm
andtidy.felm
(#502 and #509 by @MatthieuStigler) -
Many small improvements throughout
Many many thanks to all the following for their thoughtful comments on design, bug reports and PRs! The community of broom contributors has been kind, supportive and insightful and I look forward to working you all again!
@atyre2, @batpigandme, @bfgray3, @bmannakee, @briatte, @cawoodjm, @cimentadaj, @dan87134, @dgrtwo, @dmenne, @ekatko1, @ellessenne, @erleholgersen, @ethchr, @Hong-Revo, @huftis, @IndrajeetPatil, @jacob-long, @jarvisc1, @jenzopr, @jgabry, @jimhester, @josue-rodriguez, @karldw, @kfeilich, @larmarange, @lboller, @mariusbarth, @michaelweylandt, @mine-cetinkaya-rundel, @mkuehn10, @mvevans89, @nutterb, @ShreyasSingh, @stephlocke, @strengejacke, @topepo, @willbowditch, @WillemSleegers, @wilsonfreitas, and @MatthieuStigler
-
Fixed gam tidiers to work with "Gam" objects, due to an update in gam 1.15. This fixes failing CRAN tests
-
Improved test coverage (thanks to #267 from Derek Chiu)
-
Changed the deprecated
dplyr::failwith
topurrr::possibly
-
augment
andglance
on NULLs now return an empty data frame -
Deprecated the
inflate()
function in favor oftidyr::crossing
-
Fixed confidence intervals in the gmm tidier (thanks to #242 from David Hugh-Jones)
-
Fixed a bug in bootstrap tidiers (thanks to #167 from Jeremy Biesanz)
-
Fixed tidy.lm with
quick = TRUE
to return terms as character rather than factor (thanks to #191 from Matteo Sostero) -
Added tidiers for
ivreg
objects from the AER package (thanks to #245 from David Hugh-Jones) -
Added tidiers for
survdiff
objects from the survival package (thanks to #147 from Michał Bojanowski) -
Added tidiers for
emmeans
from the emmeans package (thanks to #252 from Matthew Kay) -
Added tidiers for
speedlm
andspeedglm
from the speedglm package (thanks to #248 from David Hugh-Jones) -
Added tidiers for
muhaz
objects from the muhaz package (thanks to #251 from Andreas Bender) -
Added tidiers for
decompose
andstl
objects from stats (thanks to #165 from Aaron Jacobs)
-
Added tidiers for
lsmobj
andref.grid
objects from the lsmeans package -
Added tidiers for
betareg
objects from the betareg package -
Added tidiers for
lmRob
andglmRob
objects from the robust package -
Added tidiers for
brms
objects from the brms package (thanks to #149 from Paul Buerkner) -
Fixed tidiers for orcutt 2.0
-
Changed
tidy.glmnet
to filter out rows where estimate == 0. -
Updates to
rstanarm
tidiers (thanks to #177 from Jonah Gabry) -
Fixed issue with survival package 2.40-1 (thanks to #180 from Marcus Walz)
-
Added AppVeyor, codecov.io, and code of conduct
-
Changed name of "NA's" column in summaryDefault output to "na"
-
Fixed
tidy.TukeyHSD
to includeterm
column. Also addedseparate.levels
argument, with option to separatecomparison
intolevel1
andlevel2
-
Fixed
tidy.manova
to use correct column name for test (previously, alwayspillai
) -
Added
kde_tidiers
to tidy kernel density estimates -
Added
orcutt_tidiers
to tidy the results ofcochrane.orcutt
orcutt package -
Added
tidy.dist
to tidy the distance matrix output ofdist
from the stats package -
Added
tidy
andglance
forlmodel2
objects from the lmodel2 package -
Added tidiers for
poLCA
objects from the poLCA package -
Added tidiers for sparse matrices from the Matrix package
-
Added tidiers for
prcomp
objects -
Added tidiers for
Mclust
objects from the Mclust package -
Added tidiers for
acf
objects -
Fixed to be compatible with dplyr 0.5, which is being submitted to CRAN
-
Added tidiers for geeglm, nlrq, roc, boot, bgterm, kappa, binWidth, binDesign, rcorr, stanfit, rjags, gamlss, and mle2 objects.
-
Added
tidy
methods for lists, including u, d, v lists fromsvd
, and x, y, z lists used byimage
andpersp
-
Added
quick
argument totidy.lm
,tidy.nls
, andtidy.biglm
, to create a smaller and faster version of the output. -
Changed
rowwise_df_tidiers
to allow the original data to be saved as a list column, then provided as a column name toaugment
. This required removingdata
from theaugment
S3 signature. Also addedtests-rowwise.R
-
Fixed various issues in ANOVA output
-
Fixed various issues in lme4 output
-
Fixed issues in tests caused by dev version of ggplot2
-
Added tidiers for "plm" (panel linear model) objects from the plm package.
-
Added
tidy.coeftest
for coeftest objects from the lmtest package. -
Set up
tidy.lm
to work with "mlm" (multiple linear model) objects (those with multiple response columns). -
Added
tidy
andglance
for "biglm" and "bigglm" objects from the biglm package. -
Fixed bug in
tidy.coxph
when one-row matrices are returned -
Added
tidy.power.htest
-
Added
tidy
andglance
forsummaryDefault
objects -
Added tidiers for "lme" (linear mixed effects models) from the nlme package
-
Added
tidy
andglance
formultinom
objects from the nnet package.
-
Fixed bug in
tidy.pairwise.htest
, which now can handle cases where the grouping variable is numeric. -
Added
tidy.aovlist
method. This addedstringr
package to IMPORTS to trim whitespace from the beginning and end of theterm
andstratum
columns. This also required adjustingtidy.aov
so that it could handle strata that are missing p-values. -
Set up
glance.lm
to work withaov
objects along withlm
objects. -
Added
tidy
andglance
for matrix objects, withtidy.matrix
converting a matrix to a data frame with rownames included, andglance.matrix
returning the same result asglance.data.frame
. -
Changed DESCRIPTION Authors@R to new format
-
Fixed small bug in
felm
where the.fitted
and.resid
columns were matrices rather than vectors. -
Added tidiers for
rlm
(robust linear model) andgam
(generalized additive model) objects, including adjustments to "lm" tidiers in order to handle them. See?rlm_tidiers
and?gam_tidiers
for more. -
Removed rownames from
tidy.cv.glmnet
output
-
The behavior of
augment
, particularly with regard to missing data and thena.exclude
argument, has through the use of theaugment_columns
function been made consistent across the following models:-
lm
-
glm
-
nls
-
merMod
(lme4
) -
survreg
(survival
) -
coxph
(survival
)
-
Unit tests in tests/testthat/test-augment.R
were added to ensure consistency
across these models.
tidy
,augment
andglance
methods were added forrowwise_df
objects, and are set up to apply across their rows. This allows for simple patterns such as:
regressions <- mtcars %>% group_by(cyl) %>% do(mod = lm(mpg ~ wt, .)) regressions %>% tidy(mod) regressions %>% augment(mod)
See ?rowwise_df_tidiers
for more.
-
Added
tidy
andglance
methods forArima
objects, andtidy
forpairwise.htest
objects. -
Fixes for CRAN: change package description to title case, removed NOTES, mostly by adding
globals.R
to declare global variables. -
This is the original version published on CRAN.
-
Tidiers have been added for S3 objects from the following packages:
-
lme4
-
glmnet
-
survival
-
zoo
-
felm
-
MASS
(ridgelm
objects)
-
-
tidy
andglance
methods for data.frames have also been added, andaugment.data.frame
produces an error (rather than returning the same data.frame). -
stderror
has been changed tostd.error
(affects many functions) to be consistent with broom's naming conventions for columns. -
A function
bootstrap
has been added based on this example, to perform the common use case of bootstrapping models.
-
Added "augment" S3 generic and various implementations. "augment" does something different from tidy: it adds columns to the original dataset, including predictions, residuals, or cluster assignments. This was originally described as "fortify" in ggplot2.
-
Added "glance" S3 generic and various implementations. "glance" produces a one-row data frame summary, which is necessary for tidy outputs with values like R^2 or F-statistics.
-
Re-wrote intro broom vignette/README to introduce all three methods.
-
Wrote a new kmeans vignette.
-
Added tidying methods for multcomp, sp, and map objects (from fortify-multcomp, fortify-sp, and fortify-map from ggplot2).
-
Because this integrates substantial amounts of ggplot2 code (with permission), added Hadley Wickham as an author in DESCRIPTION.