NEWS.md
(To be released as 0.7.0)
broom 0.7.0
is a major release with a large number of breaking changes. Most of these breaking changes are meant to improve maintainability and internal consistency, which have posed long-standing difficulties.
This release features a number of unannounced hard-deprecations. I am sorry that I did not have the time to ease these transitions, and am actively looking for assistance maintaining broom
.
We have changed how we report degrees of freedom for lm
objects (#212, #273). This is especially important for instructors in statistics courses. Previously the df
column in glance.lm()
reported the rank of the design matrix. Now it reports degrees of freedom of the numerator for the overall F-statistic. This is equal to the rank of the model matrix minus one (unless you omit an intercept column), so the new df
should be the old df
minus one.
tidy()
no longer checks for a log or logit link when exponentiate = TRUE
, and we have refactored to remove extraneous exponentiate
arguments. If you set exponentiate = TRUE
, we assume you know what you are doing and that you want exponentiated coefficients (and confidence intervals if conf.int = TRUE
) regardless of link function.
We have simplified glance.aov()
, which now contains only the following columns: logLik
, AIC
, BIC, deviance
, df.residual
, nobs
(see #212). Note that tidy.aov()
gives more complete information about degrees of freedom in an aov
object.
We are moving away from supporting summary.*()
objects. In particular, we have removed tidy.summary.lm()
as part of a major overhaul of internals. Instead of calling tidy()
on summary
-like objects, please call tidy()
directly on model objects moving forward.
We have removed all support for the quick
argument in tidy()
methods. This is to simplify internals and is for maintainability purposes. We anticipate this will not influence many users as few people seemed to use it. If this majorly cramps your style, let us know, as we are considering a new verb to return only model parameters. In the meantime, stats::coef()
together with tibble::enframe()
provides most of the functionality of tidy(..., quick = TRUE)
.
All conf.int
arguments now default to FALSE
, and all conf.level
arguments now default to 0.95
. This should primarily affect tidy.survreg()
, which previously always returned confidence intervals, although there are some others.
Tidiers for emmeans
-objects use the arguments conf.int
and conf.level
instead of relying on the argument names native to the emmeans::summary()
-methods (i.e., infer
and level
). Similarly, multcomp
-tidiers now include a call to summary()
as previous behavior was akin to setting the now removed argument quick = TRUE
. Both families of tidiers now use the adj.p.value
column name when appropriate. Finally, emmeans
-, multcomp
-, and TukeyHSD
-tidiers now consistently use the column names contrast
and null.value
instead of comparison
, level1
and level2
, or lhs
and rhs
(see #692).
This release of broom
hard-deprecates the following functions and tidiers:
broom
bootstrap()
confint_tidy()
glance.summary.lm()
augment.glmRob()
tidy.table()
and tidy.ftable()
have been deprecated in favor of tibble::as_tibble()
tidy.summaryDefault()
and glance.summaryDefault()
have been deprecated in favor of skimr::skim()
fix_data_frame()
We regret that we were unable to provide warnings for some of these changes.
Mixed models: we have also gone forward with our planned mixed model deprecations, and have removed the following methods, which now live in broom.mixed
:
tidy.brmsfit()
tidy.merMod()
, glance.merMod()
, augment.merMod()
tidyMCMC()
, tidy.rjags()
, tidy.stanfit()
tidy.lme()
, glance.lme()
, augment.lme()
tidy.stanreg()
, glance.stanreg()
augment.factanal()
now returns a tibble with columns names .fs1
, .fs2
, …, instead of factor1
, factor2
, … (#650).
We have renamed the output of augment.htest()
. In particular, we have renamed the .residuals
column to .resid
and the .stdres
to .std.resid
for consistency. These changes will only affect chi-squared tests.
tidy.ridgelm()
now always return a GCV
column and never returns an xm
column (#532)
tidy.dist()
no longer supports the upper
argument
augment()
data
argument to augment()
generic (did this happen?)have overhauled augment()
for general consistency improvements (hopefully, pending getting safepredict()
going urgh)
If you pass a dataset to augment()
via the data
or newdata
arguments, you are now guaranteed that the augmented dataset will have exactly the same number of rows as the original dataset. This differs from previous behavior primarily when there are missing values. Previously augment()
would drop rows containing NA
. This should no longer be the case.
augment()
no longer accepts an na.action
argument
We no longer cram everything through augment.lm()
and it has subsequently losts a lot of arguments that were needed when it was a frankenstein do everything function
augment()
tries to give an informative error when data
isn’t the original training data
Added new vignette detailing use of modelgenerics
and modeltests
packages
Moved core tests to the modeltests
package
Many glance()
methods now return a nobs
column, which contains the number of data points used to fit the model! (#597 by @vincentarelbundock)
We now use rlang::arg_match()
when possible instead of arg.match()
to give more informative errors on argument mismatches.
Add option to lfe::felm
for robust and cluster standard errors (#772)
Added tidier for car::Anova
(#754)
Tidy methods for car::Anova
(#754)
Added tidy()
and glance()
methods for speedglm
objects from the speedglm
package
Added tidier for summary.manova
(#729) - TODO // remove as summary.*
is forbidden
Added tidier for epiR::epi.2by2
(#711)
Added tidiers for rma
objects from the metafor
package (#674, @malcolmbarrett, @softloud)
Added tidiers for pam
objects from the cluster
package. (#637)
Added tidy.svyglm()
and glance.svyglm()
(#611)
Added tidy.regsubsets()
for best subsets linear regression from the leaps
package
Added method tidy.lm.beta()
to tidy lm.beta
class models (#545 by @mattle24)
Added tidiers for lmrob
and glmrob
objects from the robustbase
package (#205, #505).
Added method tidy.systemfit()
to tidy systemfit
class models (by @jaspercooper)
Added tidiers for lmrob
and glmrob
objects from the robustbase
package (#205, #505).
Added tidy.summary_emm()
(#691 by @crsh)
tidy.felm()
now has a robust = TRUE/FALSE
option that supports robust and cluster standard errors (#772)
Make .fitted
values respect type.predict
argument of augment.clm()
. (#617)
Return factor rather than numeric class predictions in .fitted
of augment.polr()
. (#619)
tidy.kmeans()
now uses the names of the input variables in the output by default. Set col.names = NULL
to recover the old behavior.
Previously, F-statistics for weak instruments were returned through glance.ivreg()
. F-statistics are now returned through tidy.ivreg(instruments = TRUE)
. Default is tidy.ivreg(instruments = FALSE)
. glance.ivreg()
still returns Wu-Hausman and Sargan test statistics.
glance.biglm()
now returns a df.residual
column
tidy.prcomp()
parameter matrix
gained new options "scores"
, "loadings"
, and "eigenvalues"
(#557 by @GegznaV)
tidy_optim()
now returns the standard error provides the standard error if the Hessian is present. (#529 by @billdenney) (TODO: think about this)
tidy.htest()
column names are now run through make.names()
to ensure syntactic correctness (#549 by @karissawhiting) (TODO: use tidyverse name repair?)
tidy.lmodel2()
now returns a p.value
column (#570)
tidy.lsmobj()
gained a conf.int
argument for consistency with other tidiers.
tidy.zoo()
now doesn’t change column names that have spaces or other special characters (previously they were converted to data.frame friendly column names by make.names
)
glance.lavaan()
now uses lavaan extractor functions instead of subsetting the fit object manually. (#835)
Bug fix to return confidence intervals correct in tidy.drc() (#798)
Bug fix to better allow tidy.boot()
to support confidence intervals (#581)
Bug fix to allow augment.kmeans()
to work with masked data (#609)
Bug fix to allow augment.Mclust()
to work on univariate data (#490)
Bug fix to allow tidy.htest()
to supports equal variances (#608)
Bug fix for tidy.polr()
when passed conf.int = TRUE
(#498)
Bug fixes in glance.lavaan()
: address confidence interval error (#577) and correct reported nobs
and norig
(#835)
Bug fix for tidy.survreg()
when robust
is set to TRUE
in model fitting (#842, #728)
Bug fix in muhaz tidiers to ensure output is always a tibble
(#824)
tibble 3.0.0
release. Removed xergm
dependency.Fixes failing CRAN checks
Changes to accommodate ergm 3.10 release. tidy.ergm()
no longer has a quick
argument. The old default of quick = FALSE
is now the only option.
Tidiers now return tibble::tibble()
s. This release also includes several new tidiers, new vignettes and a large number of bug fixes. We’ve also begun to more rigorously define tidier specifications: we’ve laid part of the groundwork for stricter and more consistent tidying, but the new tidier specifications are not yet complete. These will appear in the next release.
Additionally, users should note that we are in the process of migrating tidying methods for mixed models and Bayesian models to broom.mixed
. broom.mixed
is not on CRAN yet, but all mixed model and Bayesian tidiers will be deprecated once broom.mixed
is on CRAN. No further development of mixed model tidiers will take place in broom
.
Almost all tidiers should now return tibble
s rather than data.frame
s. Deprecated tidying methods, Bayesian and mixed model tidiers still return data.frame
s.
Users are mostly to experience issues when using augment
in situations where tibbles are stricter than data frames. For example, specifying model covariates as a matrix object will now error:
library(broom) library(quantreg) fit <- rq(stack.loss ~ stack.x, tau = .5) broom::augment(fit) #> Error: Column `stack.x` must be a 1d atomic vector or a list
This is because the default data
argument data = model.frame(fit)
cannot be coerced to tibble
.
Another consequence of this is that augment.survreg
and augment.coxph
from the survival
package now require that the user explicitly passes data to either the data
or newdata
arguments.
These restrictions will be relaxed in an upcoming release of broom
pending support for matrix-columns in tibbles.
Developers are likely to experience issues:
subsetting tibbles with [
, which returns a tibble rather than a vector.
setting rownames on tibbles, which is deprecated.
using matrix and vector tidiers, now deprecated.
handling the additional tibble classes tbl_df
and tbl
beyond the data.frame
class
linking to defunct documentation files – broom recently moved all tidiers to a roxygen2
template based documentation system.
This version of broom
includes several new vignettes:
vignette("available-methods", package = "broom")
contains a table detailing which tidying methods are available
vignette("adding-tidiers", package = "broom")
is an in-progress guide for contributors on how to add new tidiers to broom
vignette("glossary", package = "broom")
contains tables describing acceptable argument names and column names for the in-progress new specification.
Several old vignettes have also been updated:
vignette("bootstrapping", package = "broom")
now relies on the rsample
package and a tidyr::nest
-purrr::map
-tidyr::unnest
workflow. This is now the recommended workflow for working with multiple models, as opposed to the old dplyr::rowwise
-dplyr::do
based workflow.Matrix and vector tidiers have been deprecated in favor of tibble::as_tibble
and tibble::enframe
Dataframe tidiers and rowwise dataframe tidiers have been deprecated
bootstrap()
has been deprecated in favor of the rsample
inflate
has been removed from broom
The alpha
argument has been removed from quantreg
tidy methods
The separate.levels
argument has been removed from tidy.TukeyHSD
. To obtain the effect of separate.levels = TRUE
, users may tidyr::separate
after tidying. This is consistent with the multcomp
tidier behavior.
The fe.error
argument was removed from tidy.felm
. When fixed effects are tidier, their standard errors are now always included.
The diag
argument in tidy.dist
has been renamed diagonal
glance
support for arima
objects fit with method = "CSS"
(#396 by @josue-rodriguez)
A bug fix to re-enable tidying glmnet
objects with family = multinomial
(#395 by @erleholgersen)
A bug fix to allow tidying quantreg
intercept only models (#378 by @erleholgersen)
A bug fix for aovlist
objects (#377 by @mvevans89)
Support for glmnetUtils
objects (#352 by @Hong-Revo)
A bug fix to allow tidy_emmeans
to handle column names with dashes (#351 by @bmannakee)
augment.felm
no longer returns .fe_
and .comp
columns
Support saved formulas in augment.felm
(#347 by @ShreyasSingh)
A new tidier for caret::confusionMatrix
objects (#344 by @mkuehn10)
Tidiers for Kendall::Kendall
objects (#343 by @cimentadaj)
A new tidying method for car::durbinWatsonTest
objects (#341 by @mkuehn10)
glance
throws an informative error for quantreg:rq
models fit with multiple tau
values (#338 by @bfgray3)
tidy.glmnet
gains the ability to retain zero-valued coefficients with a return_zeros
argument that defaults to FALSE
(#337 by @bfgray3)
Tidiers for ordinal::clm
, ordinal::clmm
, survey::svyolr
and MASS::polr
ordinal model objects (#332 by @larmarange)
Support for anova
objects from car::Anova
(#325 by @mariusbarth)
Tidiers for tseries::garch
models (#323 by @wilsonfreitas)
Improved error messages (#303 by @michaelweylandt)
Compatibility with new rstanarm
and loo
packages (#298 by @jgabry)
Support for tidying lists return by irlba::irlba
Bug fix for tidy.prcomp
when missing labels (#265 by @corybrunson)
Added a pkgdown
site at https://broom.tidyverse.org/ (#260 by @jayhesselberth)
Added tidiers for AER::ivreg
models (#247 by @hughjonesd)
Added tidiers for the lavaan
package (#233 by @puterleat)
Added conf.int
argument to tidy.coxph
(#220 by @larmarange)
Added augment
method for chi-squared tests (#138 by @larmarange)
changed default se.type for tidy.rq
to match that of quantreg::summary.rq()
(#404 by @ethchr)
Added argument quick
for tidy.plm
and tidy.felm
(#502 and #509 by @MatthieuStigler)
Many small improvements throughout
Many many thanks to all the following for their thoughtful comments on design, bug reports and PRs! The community of broom contributors has been kind, supportive and insightful and I look forward to working you all again!
@atyre2, @batpigandme, @bfgray3, @bmannakee, @briatte, @cawoodjm, @cimentadaj, @dan87134, @dgrtwo, @dmenne, @ekatko1, @ellessenne, @erleholgersen, @ethchr, @Hong-Revo, @huftis, @IndrajeetPatil, @jacob-long, @jarvisc1, @jenzopr, @jgabry, @jimhester, @josue-rodriguez, @karldw, @kfeilich, @larmarange, @lboller, @mariusbarth, @michaelweylandt, @mine-cetinkaya-rundel, @mkuehn10, @mvevans89, @nutterb, @ShreyasSingh, @stephlocke, @strengejacke, @topepo, @willbowditch, @WillemSleegers, @wilsonfreitas, and @MatthieuStigler
Fixed gam tidiers to work with “Gam” objects, due to an update in gam 1.15. This fixes failing CRAN tests
Improved test coverage (thanks to #267 from Derek Chiu)
Changed the deprecated dplyr::failwith
to purrr::possibly
augment
and glance
on NULLs now return an empty data frame
Deprecated the inflate()
function in favor of tidyr::crossing
Fixed confidence intervals in the gmm tidier (thanks to #242 from David Hugh-Jones)
Fixed a bug in bootstrap tidiers (thanks to #167 from Jeremy Biesanz)
Fixed tidy.lm with quick = TRUE
to return terms as character rather than factor (thanks to #191 from Matteo Sostero)
Added tidiers for ivreg
objects from the AER package (thanks to #245 from David Hugh-Jones)
Added tidiers for survdiff
objects from the survival package (thanks to #147 from Michał Bojanowski)
Added tidiers for emmeans
from the emmeans package (thanks to #252 from Matthew Kay)
Added tidiers for speedlm
and speedglm
from the speedglm package (thanks to #248 from David Hugh-Jones)
Added tidiers for muhaz
objects from the muhaz package (thanks to #251 from Andreas Bender)
Added tidiers for decompose
and stl
objects from stats (thanks to #165 from Aaron Jacobs)
Added tidiers for lsmobj
and ref.grid
objects from the lsmeans package
Added tidiers for betareg
objects from the betareg package
Added tidiers for lmRob
and glmRob
objects from the robust package
Added tidiers for brms
objects from the brms package (thanks to #149 from Paul Buerkner)
Fixed tidiers for orcutt 2.0
Changed tidy.glmnet
to filter out rows where estimate == 0.
Updates to rstanarm
tidiers (thanks to #177 from Jonah Gabry)
Fixed issue with survival package 2.40-1 (thanks to #180 from Marcus Walz)
Added AppVeyor, codecov.io, and code of conduct
Changed name of “NA’s” column in summaryDefault output to “na”
Fixed tidy.TukeyHSD
to include term
column. Also added separate.levels
argument, with option to separate comparison
into level1
and level2
Fixed tidy.manova
to use correct column name for test (previously, always pillai
)
Added kde_tidiers
to tidy kernel density estimates
Added orcutt_tidiers
to tidy the results of cochrane.orcutt
orcutt package
Added tidy.dist
to tidy the distance matrix output of dist
from the stats package
Added tidy
and glance
for lmodel2
objects from the lmodel2 package
Added tidiers for poLCA
objects from the poLCA package
Added tidiers for sparse matrices from the Matrix package
Added tidiers for prcomp
objects
Added tidiers for Mclust
objects from the Mclust package
Added tidiers for acf
objects
Fixed to be compatible with dplyr 0.5, which is being submitted to CRAN
Added tidiers for geeglm, nlrq, roc, boot, bgterm, kappa, binWidth, binDesign, rcorr, stanfit, rjags, gamlss, and mle2 objects.
Added tidy
methods for lists, including u, d, v lists from svd
, and x, y, z lists used by image
and persp
Added quick
argument to tidy.lm
, tidy.nls
, and tidy.biglm
, to create a smaller and faster version of the output.
Changed rowwise_df_tidiers
to allow the original data to be saved as a list column, then provided as a column name to augment
. This required removing data
from the augment
S3 signature. Also added tests-rowwise.R
Fixed various issues in ANOVA output
Fixed various issues in lme4 output
Fixed issues in tests caused by dev version of ggplot2
Added tidiers for “plm” (panel linear model) objects from the plm package.
Added tidy.coeftest
for coeftest objects from the lmtest package.
Set up tidy.lm
to work with “mlm” (multiple linear model) objects (those with multiple response columns).
Added tidy
and glance
for “biglm” and “bigglm” objects from the biglm package.
Fixed bug in tidy.coxph
when one-row matrices are returned
Added tidy.power.htest
Added tidy
and glance
for summaryDefault
objects
Added tidiers for “lme” (linear mixed effects models) from the nlme package
Added tidy
and glance
for multinom
objects from the nnet package.
Fixed bug in tidy.pairwise.htest
, which now can handle cases where the grouping variable is numeric.
Added tidy.aovlist
method. This added stringr
package to IMPORTS to trim whitespace from the beginning and end of the term
and stratum
columns. This also required adjusting tidy.aov
so that it could handle strata that are missing p-values.
Set up glance.lm
to work with aov
objects along with lm
objects.
Added tidy
and glance
for matrix objects, with tidy.matrix
converting a matrix to a data frame with rownames included, and glance.matrix
returning the same result as glance.data.frame
.
Changed DESCRIPTION Authors@R to new format
Fixed small bug in felm
where the .fitted
and .resid
columns were matrices rather than vectors.
Added tidiers for rlm
(robust linear model) and gam
(generalized additive model) objects, including adjustments to “lm” tidiers in order to handle them. See ?rlm_tidiers
and ?gam_tidiers
for more.
Removed rownames from tidy.cv.glmnet
output
The behavior of augment
, particularly with regard to missing data and the na.exclude
argument, has through the use of the augment_columns
function been made consistent across the following models:
lm
glm
nls
merMod
(lme4
)
survreg
(survival
)
coxph
(survival
)
Unit tests in tests/testthat/test-augment.R
were added to ensure consistency across these models.
tidy
, augment
and glance
methods were added for rowwise_df
objects, and are set up to apply across their rows. This allows for simple patterns such as:regressions <- mtcars %>% group_by(cyl) %>% do(mod = lm(mpg ~ wt, .)) regressions %>% tidy(mod) regressions %>% augment(mod)
See ?rowwise_df_tidiers
for more.
Added tidy
and glance
methods for Arima
objects, and tidy
for pairwise.htest
objects.
Fixes for CRAN: change package description to title case, removed NOTES, mostly by adding globals.R
to declare global variables.
This is the original version published on CRAN.
Tidiers have been added for S3 objects from the following packages:
lme4
glmnet
survival
zoo
felm
MASS
(ridgelm
objects)
tidy
and glance
methods for data.frames have also been added, and augment.data.frame
produces an error (rather than returning the same data.frame).
stderror
has been changed to std.error
(affects many functions) to be consistent with broom’s naming conventions for columns.
A function bootstrap
has been added based on this example, to perform the common use case of bootstrapping models.
Added “augment” S3 generic and various implementations. “augment” does something different from tidy: it adds columns to the original dataset, including predictions, residuals, or cluster assignments. This was originally described as “fortify” in ggplot2.
Added “glance” S3 generic and various implementations. “glance” produces a one-row data frame summary, which is necessary for tidy outputs with values like R^2 or F-statistics.
Re-wrote intro broom vignette/README to introduce all three methods.
Wrote a new kmeans vignette.
Added tidying methods for multcomp, sp, and map objects (from fortify-multcomp, fortify-sp, and fortify-map from ggplot2).
Because this integrates substantial amounts of ggplot2 code (with permission), added Hadley Wickham as an author in DESCRIPTION.