The treestats R package contains rapid, C++ based, functions to calculate summary statistics on phylogenies. For some functions (but not all, see below), the phylogenies are required to be ultrametric and/or binary.
To get started, you can either install from CRAN or use the latest version from GitHub:
install.packages("treestats") # install from CRAN
# use the devtools package to install latest version from GitHub:
install.packages("devtools")
devtools::install_github("thijsjanzen/treestats")
Given a tree (for example a simulated tree, as in the code example), you can either access individual statistics, or calculate all currently implemented statistics:
focal_tree <- ape::rphylo(n = 10, birth = 1, death = 0)
colless_stat <- treestats::colless(focal_tree)
all_stats <- treestats::calc_all_stats(focal_tree)
The following summary statistics are included:
Statistic | Information | Normalization | Assumes Ultrametric tree | Requires binary tree | Reference |
area_per_pair | Topology | Yule | NO | YES | Lima et al., 2020 |
average_leaf_depth | Topology | Yule | NO | YES | Shao & Sokal, 1990 |
avg_ladder | Topology | None | NO | YES | Kendall et al., 2018 |
avg_vert_depth | Topology | None | NO | NO | Herrada, 2011 |
b1 | Topology | Tips | NO | NO | Shao & Sokal, 1990 |
b2 | Topology | Yule | NO | NO | Shao & Sokal, 1990 |
beta | Topology | None | NO | YES | Aldous, 1996 |
blum | Topology | None | NO | YES | Blum & François, 2006 |
cherries | Topology | Yule | NO | YES | McKenzie et al., 1999 |
colless | Topology | Yule | NO | YES | Colless, 1982 |
colless_corr | Topology | None | NO | YES | Heard, 1992 |
colless_quad | Topology | None | NO | YES | Bartoszek et al., 2021 |
crown_age | Branching times | None | NO | NO | |
diameter | Topology | None | NO | YES | Chindelevitch et al., 2021 |
double_cherries | Topology | None | NO | YES | Chindelevitch et al., 2021 |
eigen_centrality | Topology | None | NO | NO | Chindelevitch et al., 2021 |
eigen_centralityW | Topology + branch lengths | None | NO | NO | Chindelevitch et al., 2021 |
ew_colless | Topology | None | NO | YES | Mooers & S. B. Heard, 1997 |
four_prong | Topology | None | NO | YES | Chindelevitch et al., 2021 |
gamma | Branching times | None | YES | NO | Pybus & Harvey, 2000 |
i_stat | Topology | None | NO | YES | Fusco & Cronk, 1995 |
il_number | Topology | Tips | NO | NO | Kendall et al., 2018 |
imbalance_steps | Topology | Tips | NO | NO | Janzen & Etienne, 2024 |
j_one | Topology | None | NO | YES | Lemant et al., 2022 |
j_stat | Topology + branch lengths | None | NO | NO | Izsák & Papp, 2000 |
laplace_spectrum_a | Topology + branch lengths | None | YES | NO | Lewitus & Morlon, 2016 |
laplace_spectrum_e | Topology + branch lengths | None | YES | NO | Lewitus & Morlon, 2016 |
laplace_spectrum_g | Topology + branch lengths | None | YES | NO | Lewitus & Morlon, 2016 |
laplace_spectrum_p | Topology + branch lengths | None | YES | NO | Lewitus & Morlon, 2016 |
max_adj | Topology + branch lengths | None | NO | YES | Chindelevitch et al., 2021 |
max_betweenness | Topology | Tips | NO | YES | Chindelevitch et al., 2021 |
max_closeness | Topology | Tips | NO | YES | Chindelevitch et al., 2021 |
max_closenessW | Topology + branch lengths | None | NO | YES | Chindelevitch et al., 2021 |
max_del_width | Topology | Tips | NO | NO | Colijn & Gardy, 2014 |
max_depth | Topology | Tips | NO | NO | Colijn & Gardy, 2014 |
max_ladder | Topology | None | NO | YES | Kendall et al., 2018 |
max_laplace | Topology + branch lengths | None | NO | YES | Chindelevitch et al., 2021 |
max_width | Topology | Tips | NO | NO | Colijn & Gardy, 2014 |
mean_branch_length | Topology + branch lengths | None | NO | NO | Janzen & Etienne, 2017 |
mean_branch_length_ext | Topology + branch lengths | None | NO | NO | Saulnier et al., 2017 |
mean_branch_length_int | Topology + branch lengths | None | NO | NO | Saulnier et al., 2017 |
min_adj | Topology + branch lengths | None | NO | YES | Chindelevitch et al., 2021 |
min_laplace | Topology + branch lengths | None | NO | YES | Chindelevitch et al., 2021 |
mntd | Topology + branch lengths | None | NO | NO | Webb et al., 2002 |
mpd | Topology + branch lengths | Tips | NO | NO | Webb et al., 2002 |
mw_over_md | Topology | None | NO | NO | Colijn & Gardy, 2014 |
nltt_base | Branching times | None | YES | NO | Janzen et al., 2015 |
number_of_lineages | Topology + branch lengths | None | NO | NO | |
phylogenetic_div | Topology + branch lengths | None | NO | NO | Faith, 1992 |
pigot_rho | Branching times | None | YES | NO | Pigot et al., 2010 |
pitchforks | Topology | Tips | NO | NO | Kendall et al., 2018 |
psv | Topology + branch lengths | Tips | NO | NO | Helmus et al., 2007 |
rogers | Topology | Tips | NO | YES | Rogers, 1996 |
root_imbalance | Topology | None | NO | YES | Guyer et al., 1993 |
rquartet | Topology | Yule | NO | NO | Coronado et al., 2019 |
sackin | Topology | Yule | NO | YES | Sackin, 1972 |
stairs | Topology | None | NO | YES | Norström et al., 2012 |
stairs2 | Topology | None | NO | YES | Norström et al., 2012 |
symmetry_nodes | Topology | Tips | NO | YES | Kersting & Fischer, 2021 |
tot_coph | Topology | Yule | NO | YES | Mir et al., 2013 |
tot_internal_path | Topology | None | NO | NO | Knuth, 1997 |
tot_path | Topology | None | NO | YES | Colijn & Gardy, 2014 |
tree_height | Branching times | None | NO | NO | |
treeness | Topology + branch lengths | None | NO | NO | Astolfi & Zonta-Sgaramella, 1984 |
var_branch_length | Topology + branch lengths | None | NO | NO | Saulnier et al., 2017 |
var_branch_length_ext | Topology + branch lengths | None | NO | NO | Saulnier et al., 2017 |
var_branch_length_int | Topology + branch lengths | None | NO | NO | Saulnier et al., 2017 |
var_depth | Topology | Yule | NO | NO | Coronado et al., 2020 |
vpd | Topology + branch lengths | None | NO | NO | Webb et al., 2002 |
wiener | Topology + branch lengths | None | NO | YES | Chindelevitch et al., 2021 |
For all of these statistics, the package provides Rcpp versions that are much, much faster than their R sister functions. Furthermore, some additional functions have been improved as well:
- ape::branching.times
- DDD::phylo2L
- DDD::L2phylo
For the Rcpp improved summary statistics (excluding statistics that rely on the calculation of eigen values, as these rely on the Rcpp independent Eigen code), R independent C++ code is provided in the inst/include folder. These can be independently linked by adding the treestats package in the DESCRIPTION in both the LinkingTo and Depends fields. Then, in your package, you can also calculate these functions.
Please note that for all functions, there are two versions available: 1) based on input of a phylo object, which is typically one 2-column matrix containing all edges, and a vector containing the edge lengths (depending on which information is required to calculate the statistic). 2) based on input of an Ltable (Lineage table), which is a 4-column matrix containing information on each species, being 1) birth time, 2) parent species, 3) species label and 4) death time (or -1 if extant).
Ltable input can be useful when summary statistics are required for more complicated simulation models.