diff --git a/.Rbuildignore b/.Rbuildignore index 0cb62b7..0b4c830 100644 --- a/.Rbuildignore +++ b/.Rbuildignore @@ -8,3 +8,4 @@ ^\.Rproj\.user$ ^\.github ^\.github$ +^enmSdmX_workspace.code-workspace diff --git a/DESCRIPTION b/DESCRIPTION index b89142c..5f58952 100644 --- a/DESCRIPTION +++ b/DESCRIPTION @@ -1,8 +1,8 @@ Package: enmSdmX Type: Package Title: Species Distribution Modeling and Ecological Niche Modeling -Version: 1.1.6 -Date: 2024-06-13 +Version: 1.1.8 +Date: 2024-10-02 Authors@R: c( person( @@ -57,7 +57,7 @@ Imports: LazyData: true LazyLoad: yes URL: https://github.com/adamlilith/enmSdmX -BugReports: https://github.com/adamlilith/enmSdmX +BugReports: https://github.com/adamlilith/enmSdmX/issues Encoding: UTF-8 License: MIT + file LICENSE -RoxygenNote: 7.3.1 +RoxygenNote: 7.3.2 diff --git a/NEWS.md b/NEWS.md index 1e9134b..ea869c4 100644 --- a/NEWS.md +++ b/NEWS.md @@ -1,13 +1,19 @@ +# enmSdmX 1.1.8 2024-10-02 +- `modelSize()` can now tell size of a `ranger` random forest. + +# enmSdmX 1.1.7 2024-08-02 +- Clarified mis-leading help in several functions, including all the `trainXYZ()` functions (thank you, PT!). + # enmSdmX 1.1.6 2024-06-06 - Replaced dependency on **MuMIn** with one one **AICcmodavg** for calculation of AICc. Received warning that **MuMIn** was going to be archived on CRAN. # enmSdmX 1.1.5 2024-05-16 - Added function `trainESM()` for ensembles of small models. - Added several UTM coordinate reference systems accessible through `getCRS()`. -- Fixed bug in `precidtEnmSdm()` for predicting kernel density models from the **ks** package. +- Fixed bug in `predictEnmSdm()` for predicting kernel density models from the **ks** package. # enmSdmX 1.1.3 2023-03-06 -- `trainGLM()`, `trainNS()`, and `predictEnmSdm()` now have options to automatically center and scale predictors. +- `trainGLM()`, `trainNS()`, and `predictEnmSdm()` now have options to automatically center and scale predictors. If a GLM or NS model created using `trainGLM()` or `trainSN()` is provided to `predictEnmSdm()`, it will automatically scale the predictors properly. # enmSdmX 1.1.3 2023-02-02 - Removed dependency on `dismo`, replaced where possible by `predicts`; copied `gbm.step()` and `predict()` method for MaxEnt to `enmSdmX` as a momentary fix; would love a professional solution! diff --git a/R/bioticVelocity.r b/R/bioticVelocity.r index 50ade8b..09246dd 100644 --- a/R/bioticVelocity.r +++ b/R/bioticVelocity.r @@ -939,7 +939,7 @@ bioticVelocity <- function( #' @param x2weightedLats Matrix of latitudes weighted (i.e., by population size, given by \code{x2}). #' @param x1weightedElev Matrix of elevations weighted by x1 or \code{NULL}. #' @param x2weightedElev Matrix of elevations weighted by x2 or \code{NULL}. -#' @return a list object with distance moved and abundance of all cells north/south/east/west of reference point. +#' @returns A list object with distance moved and abundance of all cells north/south/east/west of reference point. #' @keywords internal .cardinalDistance <- function( direction, diff --git a/R/compareResponse.r b/R/compareResponse.r index 60d6b9a..498ddbd 100644 --- a/R/compareResponse.r +++ b/R/compareResponse.r @@ -2,15 +2,15 @@ #' #' This function calculates a suite of metrics reflecting of niche overlap for two response curves. Response curves are predicted responses of a uni- or multivariate model along a single variable. Depending on the user-specified settings the function calculates these values either at each pair of values of \code{pred1} and \code{pred2} \emph{or} along a smoothed version of \code{pred1} and \code{pred2}. #' -#' @param pred1 Numeric list. Predictions from first model along \code{data} (one value per row in \code{data}). -#' @param pred2 Numeric list. Predictions from second model along \code{data} (one value per row in \code{data}). +#' @param pred1 Numeric vector. Predictions from first model along \code{data} (one value per row in \code{data}). +#' @param pred2 Numeric vector. Predictions from second model along \code{data} (one value per row in \code{data}). #' @param data Data frame or matrix corresponding to \code{pred1} and \code{pred2}. -#' @param predictor Character list. Name(s) of predictor(s) for which to calculate comparisons. These must appear as column names in \code{data}. +#' @param predictor Character vector. Name(s) of predictor(s) for which to calculate comparisons. These must appear as column names in \code{data}. #' @param adjust Logical. If \code{TRUE} then subtract the mean of \code{pred1} from \code{pred1} and the mean of \code{pred2} from \code{pred2} before analysis. Useful for comparing the shapes of curves while controlling for different elevations (intercepts). #' @param gap Numeric >0. Proportion of range of predictor variable across which to assume a gap exists. Calculation of \code{areaAbsDiff} will ignore gaps wide than this. To ensure the entire range of the data is included set this equal to \code{Inf} (default). #' @param smooth Logical. If \code{TRUE} then the responses are first smoothed using loess() then compared at \code{smoothN} values along each predictor. If \code{FALSE}, then comparisons are conducted at the raw values \code{pred1} and \code{pred2}. #' @param smoothN \code{NULL} or positive integer. Number of values along "pred" at which to calculate comparisons. Only used if \code{smooth} is \code{TRUE}. If \code{NULL}, then comparisons are calculated at each value in data. If a number, then comparisons are calculated at \code{smoothN} values of \code{data[ , pred]} that cover the range of \code{data[ , pred]}. -#' @param smoothRange 2-element numeric list or \code{NULL}. If \code{smooth} is \code{TRUE}, then force loess predictions < \code{smoothRange[1]} to equal \code{smoothRange[1]} and predictions > \code{smoothRange[2]} to equal \code{smoothRange[2]}. Ignored if \code{NULL}. +#' @param smoothRange 2-element numeric vector or \code{NULL}. If \code{smooth} is \code{TRUE}, then force loess predictions < \code{smoothRange[1]} to equal \code{smoothRange[1]} and predictions > \code{smoothRange[2]} to equal \code{smoothRange[2]}. Ignored if \code{NULL}. #' @param graph Logical. If \code{TRUE} then plot predictions. #' @param ... Arguments to pass to functions like \code{sum()} (for example, \code{na.rm=TRUE}) and to \code{overlap()} (for example, \code{w} for weights). Note that if \code{smooth = TRUE}, then passing an argument called \code{w} will likely cause a warning and make results circumspect \emph{unless} weights are pre-calculated for each of the \code{smoothN} points along a particular predictor. #' @return Either a data frame (if \code{smooth = FALSE} or a list object with the smooth model plus a data frame (if \code{smooth = TRUE}) . The data frame represents metrics comparing response curves of \code{pred1} and \code{pred2}: diff --git a/R/elimCellDuplicates.r b/R/elimCellDuplicates.r index dc9939f..edff8f7 100644 --- a/R/elimCellDuplicates.r +++ b/R/elimCellDuplicates.r @@ -4,8 +4,8 @@ #' #' @param x Points. This can be either a \code{data.frame}, \code{matrix}, \code{SpatVector}, or \code{sf} object. #' @param rast \code{SpatRaster} object. -#' @param longLat Two-element character list \emph{or} two-element integer list. If \code{x} is a \code{data.frame}, then this should be a character list specifying the names of the fields in \code{x} \emph{or} a two-element list of integers that correspond to longitude and latitude (in that order). For example, \code{c('long', 'lat')} or \code{c(1, 2)}. If \code{x} is a \code{matrix}, then this is a two-element list indicating the column numbers in \code{x} that represent longitude and latitude. For example, \code{c(1, 2)}. If \code{x} is an \code{sf} object then this is ignored. -#' @param priority Either \code{NULL}, in which case for every cell with more than one point the first point in \code{x} is chosen, or a numeric or character list indicating preference for some points over others when points occur in the same cell. There should be the same number of elements in \code{priority} as there are points in \code{x}. Priority is assigned by the natural sort order of \code{priority}. For example, for 3 points in a cell for which \code{priority} is \code{c(2, 1, 3)}, the script will retain the second point and discard the rest. Similarly, if \code{priority} is \code{c('z', 'y', 'x')} then the third point will be chosen. Priorities assigned to points in other cells are ignored when thinning points in a particular cell. +#' @param longLat Two-element character vector \emph{or} two-element integer vector. If \code{x} is a \code{data.frame}, then this should be a character vector specifying the names of the fields in \code{x} \emph{or} a two-element vector of integers that correspond to longitude and latitude (in that order). For example, \code{c('long', 'lat')} or \code{c(1, 2)}. If \code{x} is a \code{matrix}, then this is a two-element vector indicating the column numbers in \code{x} that represent longitude and latitude. For example, \code{c(1, 2)}. If \code{x} is an \code{sf} object then this is ignored. +#' @param priority Either \code{NULL}, in which case for every cell with more than one point the first point in \code{x} is chosen, or a numeric or character vector indicating preference for some points over others when points occur in the same cell. There should be the same number of elements in \code{priority} as there are points in \code{x}. Priority is assigned by the natural sort order of \code{priority}. For example, for 3 points in a cell for which \code{priority} is \code{c(2, 1, 3)}, the script will retain the second point and discard the rest. Similarly, if \code{priority} is \code{c('z', 'y', 'x')} then the third point will be chosen. Priorities assigned to points in other cells are ignored when thinning points in a particular cell. #' @return Object of class \code{x}. #' @examples #' diff --git a/R/modelSize.r b/R/modelSize.r index 1fe4220..8e90b64 100644 --- a/R/modelSize.r +++ b/R/modelSize.r @@ -58,6 +58,11 @@ modelSize <- function( NA warning('Cannot determine number of presences and background sites for a MaxNet model.') + # random forest in ranger package + } else if (inherits(x, 'ranger')) { + + as.numeric(x$predictions) + # random forest in party package } else if (inherits(x, 'randomForest')) { @@ -83,7 +88,11 @@ modelSize <- function( stop('Cannot extract sample size from model object.') } } else if (binary) { - out <- c(sum(samples == 1), sum(samples == 0)) + out <- if (inherits(x, 'ranger')) { + c(sum(samples == 1), sum(samples == 2)) + } else { + c(sum(samples == 1), sum(samples == 0)) + } names(out) <- c('num1s', 'num0s') if (out[1L] == 0 & out[2L] == 0) warning('Model does not seem to be using a binary response.', .immediate=TRUE) } else { diff --git a/R/nicheOverlapMetrics.r b/R/nicheOverlapMetrics.r index d5e930e..8775d48 100644 --- a/R/nicheOverlapMetrics.r +++ b/R/nicheOverlapMetrics.r @@ -15,7 +15,7 @@ #' \item \code{cor}: Pearson correlation between \code{x1} and \code{x2} (will apply \code{logitAdj()} first unless logit=FALSE). #' \item \code{rankCor}: Spearman rank correlation. #' } -#' @param w Numeric list. Weights of predictions in \code{x1} and \code{x2}. +#' @param w Numeric vector. Weights of predictions in \code{x1} and \code{x2}. #' @param na.rm Logical. If T\code{TRUE} then remove elements in \code{x1} and \code{2} that are \code{NA} in \emph{either} \code{x1} or \code{x2}. #' @param ... Other arguments (not used). #' diff --git a/R/predictEnmSdm.r b/R/predictEnmSdm.r index 2d1ea17..e2c69f3 100644 --- a/R/predictEnmSdm.r +++ b/R/predictEnmSdm.r @@ -8,7 +8,7 @@ #' #' @param maxentFun This argument is only used if the \code{model} object is a MaxEnt model; otherwise, it is ignored. It takes a value of either \code{'terra'}, in which case a MaxEnt model is predicted using the default \code{predict} function from the \pkg{terra} package, or \code{'enmSdmX'} in which case the function \code{\link[enmSdmX]{predictMaxEnt}} function from the \pkg{enmSdmX} package (this package) is used. #' -#' @param scale Logical. If the model is a GLM trained with \code{\link{trainGLM}}, you can use the \code{scale} argument in that function to center and scale the predictors. In the \code{predictEnmSdm} function, you can set \code{scale} to \code{TRUE} to scale the rasters or data frame to which you are training using the centers (means) and scales (standard deviations) used in the mode. Otherwise, it is up to you to ensure variables are properly centered and scaled. This argument only has effect if the model is a GLM trained using \code{\link{trainGLM}}. +#' @param scale Logical. If the model is a GLM trained with \code{\link{trainGLM}} or \code{\link{trainNS}}, you can use the \code{scale} argument in that function to center and scale the predictors. In the \code{predictEnmSdm} function, you can set \code{scale} to \code{TRUE} to scale the rasters or data frame to which you are training using the centers (means) and scales (standard deviations) used in the mode. Otherwise, it is up to you to ensure variables are properly centered and scaled. This argument only has effect if the model is a GLM trained using \code{\link{trainGLM}} or \code{\link{trainNS}}. #' #' @param cores Integer >= 1. Number of cores to use when calculating multiple models. Default is 1. This is forced to 1 if \code{newdata} is a \code{SpatRaster} (i.e., as of now, there is no parallelization when predicting to a raster... sorry!). If you have issues when \code{cores} > 1, please see the \code{\link{troubleshooting_parallel_operations}} guide. #' diff --git a/R/predictMaxEnt.r b/R/predictMaxEnt.r index 57c7e73..04f13d3 100644 --- a/R/predictMaxEnt.r +++ b/R/predictMaxEnt.r @@ -10,10 +10,10 @@ #' \item \code{'cloglog'} Complementary log-log output (as per version 3.4.0+ of maxent--called "\code{maxnet()}" in the package of the same name) #' } #' @param perm Character vector. Name(s) of variable to permute before calculating predictions. This permutes the variables for \emph{all} features in which they occur. If a variable is named here, it overrides permutation settings for each feature featType. Note that for product features the variable is permuted before the product is taken. This permutation is performed before any subsequent permutations (i.e., so if both variables in a product feature are included in \code{perms}, then this is equivalent to using the \code{'before'} rule for \code{permProdRule}). Ignored if \code{NULL}. -#' @param permLinear Character list. Names(s) of variables to permute in linear features before calculating predictions. Ignored if \code{NULL}. +#' @param permLinear Character vector. Names(s) of variables to permute in linear features before calculating predictions. Ignored if \code{NULL}. #' @param permQuad Names(s) of variables to permute in quadratic features before calculating predictions. Ignored if \code{NULL}. #' @param permHinge Character vector. Names(s) of variables to permute in forward/reverse hinge features before calculating predictions. Ignored if \code{NULL}. -#' @param permThresh Character list. Names(s) of variables to permute in threshold features before calculating predictions. Ignored if \code{NULL}. +#' @param permThresh Character vector. Names(s) of variables to permute in threshold features before calculating predictions. Ignored if \code{NULL}. #' @param permProd Character list. A list object of \code{n} elements, each of which has two character elements naming the variables to permute if they occur in a product feature. Depending on the value of \code{permProdRule}, the function will either permute the individual variables then calculate their product or calculate their product, then permute the product across observations. Any other features containing the variables will produce values as normal. Example: \code{permProd=list(c('precipWinter', 'tempWinter'), c('tempSummer', 'precipFall'))}. The order of the variables in each element of \code{permProd} doesn't matter, so \code{permProd=list(c('temp', 'precip'))} is the same as \code{permProd=list(c('precip', 'temp'))}. Ignored if \code{NULL}. #' @param permProdRule Character. Rule for how permutation of product features is applied: \code{'before'} ==> Permute individual variable values then calculate product; \code{'after'} ==> calculate product then permute across these values. Ignored if \code{permProd} is \code{NULL}. #' @param ... Extra arguments (not used). diff --git a/R/private_calcWeights.r b/R/private_calcWeights.r index 3d00718..73163f7 100644 --- a/R/private_calcWeights.r +++ b/R/private_calcWeights.r @@ -2,7 +2,7 @@ #' #' Calculates weighting for a model. Each record receives a numeric weight. #' -#' @param w Either logical in which case \code{TRUE} (default) causes the total weight of presences to equal the total weight of absences (if \code{family='binomial'}) \emph{or} a numeric list of weights, one per row in \code{data} \emph{or} the name of the column in \code{data} that contains site weights. If \code{FALSE}, then each datum gets a weight of 1. +#' @param w Either logical in which case \code{TRUE} (default) causes the total weight of presences to equal the total weight of absences (if \code{family='binomial'}) \emph{or} a numeric vector of weights, one per row in \code{data} \emph{or} the name of the column in \code{data} that contains site weights. If \code{FALSE}, then each datum gets a weight of 1. #' @param data Data frame #' @param resp Name of response column #' @param family Name of family diff --git a/R/trainBrt.r b/R/trainBrt.r index 8e049e5..3a6329b 100644 --- a/R/trainBrt.r +++ b/R/trainBrt.r @@ -4,7 +4,7 @@ #' #' @param data Data frame. #' @param resp Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response. -#' @param preds Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}. +#' @param preds Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}. #' @param family Character. Name of error family. #' @param learningRate Numeric. Learning rate at which model learns from successive trees (Elith et al. 2008 recommend 0.0001 to 0.1). #' @param treeComplexity Positive integer. Tree complexity: depth of branches in a single tree (1 to 16). @@ -12,7 +12,7 @@ #' @param minTrees Positive integer. Minimum number of trees to be scored as a "usable" model (Elith et al. 2008 recommend at least 1000). Default is 1000. #' @param maxTrees Positive integer. Maximum number of trees in model set. #' @param tries Integer > 0. Number of times to try to train a model with a particular set of tuning parameters. The function will stop training the first time a model converges (usually on the first attempt). Non-convergence seems to be related to the number of trees tried in each step. So if non-convergence occurs then the function automatically increases the number of trees in the step size until \code{tries} is reached. -#' @param tryBy Character list. A list that contains one or more of \code{'learningRate'}, \code{'treeComplexity'}, \code{numTrees}, and/or \code{'stepSize'}. If a given combination of \code{learningRate}, \code{treeComplexity}, \code{numTrees}, \code{stepSize}, and \code{bagFraction} do not allow model convergence then then the function tries again but with alterations to any of the arguments named in \code{tryBy}: +#' @param tryBy Character vector. A list that contains one or more of \code{'learningRate'}, \code{'treeComplexity'}, \code{numTrees}, and/or \code{'stepSize'}. If a given combination of \code{learningRate}, \code{treeComplexity}, \code{numTrees}, \code{stepSize}, and \code{bagFraction} do not allow model convergence then then the function tries again but with alterations to any of the arguments named in \code{tryBy}: #' * \code{learningRate}: Decrease the learning rate by a factor of 10. #' * \code{treeComplexity}: Randomly increase/decrease tree complexity by 1 (minimum of 1). #' * \code{maxTrees}: Increase number of trees by 20%. diff --git a/R/trainByCrossValid.r b/R/trainByCrossValid.r index 37dac0f..09d3a36 100644 --- a/R/trainByCrossValid.r +++ b/R/trainByCrossValid.r @@ -22,7 +22,7 @@ #' \itemize{ #' \item \code{meta}: Meta-data on the model call. #' \item \code{folds}: The \code{folds} object. -#' \item \code{models} (if \code{outputModels} is \code{TRUE}): A list of model objects, one per data fold. +#' \item \code{models} (if \code{outputModels} is \code{TRUE}): A list of model objects, one per data fold. #' \item \code{tuning}: One data frame per k-fold, each containing evaluation statistics for all candidate models in the fold. In addition to algorithm-specific fields, these consist of: #' \itemize{ #' \item \code{'logLoss'}: Log loss. Higher (less negative) values imply better fit. diff --git a/R/trainGam.r b/R/trainGam.r index d183f99..fdba423 100644 --- a/R/trainGam.r +++ b/R/trainGam.r @@ -4,7 +4,7 @@ #' #' @param data Data frame. #' @param resp Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response. -#' @param preds Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}. +#' @param preds Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}. #' @param family Name of family for data error structure (see \code{?family}). #' @param gamma Initial penalty to degrees of freedom to use (larger ==> smoother fits). #' @param scale A numeric value indicating the "scale" parameter (see argument \code{scale} in \code{\link[mgcv]{gam}}). The default is 0 (which allows a single smoother for Poisson and binomial error families and unknown scale for all others.) diff --git a/R/trainGlm.r b/R/trainGlm.r index 98a5f6b..2875601 100644 --- a/R/trainGlm.r +++ b/R/trainGlm.r @@ -6,7 +6,7 @@ #' #' @param data Data frame. #' @param resp Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response. -#' @param preds Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}. +#' @param preds Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}. #' @param scale Either \code{NA} (default), or \code{TRUE} or \code{FALSE}. If \code{TRUE}, the predictors will be centered and scaled by dividing by subtracting their means then dividing by their standard deviations. The means and standard deviations will be returned in the model object under an element named "\code{scales}". For example, if you do something like \code{model <- trainGLM(data, scale=TRUE)}, then you can get the means and standard deviations using \code{model$scales$mean} and \code{model$scales$sd}. If \code{FALSE}, no scaling is done. If \code{NA} (default), then the function will check to see if non-factor predictors have means ~0 and standard deviations ~1. If not, then a warning will be printed, but the function will continue to do its operations. #' @param family Name of family for data error structure (see \code{\link[stats]{family}}). Default is to use the 'binomial' family. #' @param construct Logical. If \code{TRUE} (default) then construct model from individual terms entered in order from lowest to highest AICc up to limits set by \code{presPerTermInitial} or \code{maxTerms} is met. If \code{FALSE} then the "full" model consists of all terms allowed by \code{quadratic} and \code{interaction}. diff --git a/R/trainMaxEnt.r b/R/trainMaxEnt.r index c8aef4d..fcd6770 100644 --- a/R/trainMaxEnt.r +++ b/R/trainMaxEnt.r @@ -1,12 +1,16 @@ -#' Calibrate a MaxEnt (ver 3.3.3+ or "maxent") model using AICc +#' Calibrate a MaxEnt model using AICc #' -#' This function calculates the "best" Maxent model using AICc across all possible combinations of a set of master regularization parameters and feature classes. The best model has the lowest AICc, with ties broken by number of features (fewer is better), regularization multiplier (higher better), then finally the number of coefficients (fewer better). The function can return the best model (default), a list of models created using all possible combinations of feature classes and regularization multipliers, and/or a data frame with tuning statistics for each model. Models in the list and in the data frame are sorted from best to worst. The function requires the \code{maxent} jar file (see \emph{Details}). Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc. +#' This function calculates the "best" MaxEnt model using AICc across all possible combinations of a set of master regularization parameters and feature classes. The best model has the lowest AICc, with ties broken by number of features (fewer is better), regularization multiplier (higher better), then finally the number of coefficients (fewer better). +#' +#' The function can return the best model (default), a list of models created using all possible combinations of feature classes and regularization multipliers, and/or a data frame with tuning statistics for each model. Models in the list and in the data frame are sorted from best to worst. The function requires the \code{maxent} jar file (see \emph{Details}). Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc. +#' +#' Note that due to differences in how MaxEnt and MaxNet are implemented in their base packages, the models will not necessarily be the same even for the same training data. #' #' @param data Data frame. #' @param resp Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response. -#' @param preds Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}. +#' @param preds Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}. #' @param regMult Numeric vector. Values of the master regularization parameters (called \code{beta} in some publications) to test. -#' @param classes Character list. Names of feature classes to use (either \code{default} to use \code{lpqh}) or any combination of \code{lpqht}, where \code{l} ==> linear features, \code{p} ==> product features, \code{q} ==> quadratic features, \code{h} ==> hinge features, and \code{t} ==> threshold features. +#' @param classes Character vector. Names of feature classes to use (either \code{default} to use \code{lpqh}) or any combination of \code{lpqht}, where \code{l} ==> linear features, \code{p} ==> product features, \code{q} ==> quadratic features, \code{h} ==> hinge features, and \code{t} ==> threshold features. Example: \code{c('l', 'p', 'q')}. #' @param testClasses Logical. If \code{TRUE} (default) then test all possible combinations of classes (note that all tested models will at least have linear features). If \code{FALSE} then use the classes provided (these will not vary between models). #' @param dropOverparam Logical, if \code{TRUE} (default), drop models if they have more coefficients than training occurrences. It is possible for no models to fulfill this criterion, in which case no models will be returned. #' @param anyway Logical. Same as \code{dropOverparam} (included for backwards compatibility. If \code{NULL} (default), then the value of \code{dropOverparam} will take precedence. If \code{TRUE} or \code{FALSE} then \code{anyway} will override the value of \code{dropOverparam}. @@ -18,7 +22,7 @@ #' } #' @param forceLinear Logical. If \code{TRUE} (default) then require any tested models to include at least linear features. #' @param jackknife Logical. If \code{TRUE} (default) the the returned model will be also include jackknife testing of variable importance. -#' @param arguments \code{NULL} (default) or a character list. Options to pass to \code{maxent()}'s \code{args} argument. (Do not include \code{l}, \code{p}, \code{q}, \code{h}, \code{t}, \code{betamultiplier}, or \code{jackknife}!) +#' @param arguments \code{NULL} (default) or a character vector. Options to pass to \code{maxent()}'s \code{args} argument. (Do not include \code{l}, \code{p}, \code{q}, \code{h}, \code{t}, \code{betamultiplier}, or \code{jackknife}!) #' @param scratchDir Character. Directory to which to write temporary files. Leave as NULL to create a temporary folder in the current working directory. #' @param cores Number of cores to use. Default is 1. If you have issues when \code{cores} > 1, please see the \code{\link{troubleshooting_parallel_operations}} guide. #' @param verbose Logical. If \code{TRUE} report progress and AICc table. diff --git a/R/trainMaxNet.r b/R/trainMaxNet.r index 13a54f5..6ca2ea9 100644 --- a/R/trainMaxNet.r +++ b/R/trainMaxNet.r @@ -1,12 +1,16 @@ -#' Calibrate a MaxNet (MaxEnt) model using AICc +#' Calibrate a MaxNet model using AICc #' -#' This function calculates the "best" MaxNet model using AICc across all possible combinations of a set of master regularization parameters and feature classes. The "best" model has the lowest AICc, with ties broken by number of features (fewer is better), regularization multiplier (higher better), then finally the number of coefficients (fewer better). The function can return the best model (default), a list of models created using all possible combinations of feature classes and regularization multipliers, and/or a data frame with tuning statistics for each model. Models in the list and in the data frame are sorted from best to worst. Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc. +#' This function calculates the "best" MaxNet model using AICc across all possible combinations of a set of master regularization parameters and feature classes. The "best" model has the lowest AICc, with ties broken by number of features (fewer is better), regularization multiplier (higher better), then finally the number of coefficients (fewer better). +#' +#' The function can return the best model (default), a list of models created using all possible combinations of feature classes and regularization multipliers, and/or a data frame with tuning statistics for each model. Models in the list and in the data frame are sorted from best to worst. Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc. +#' +#' Note that due to differences in how MaxEnt and MaxNet are implemented in their base packages, the models will not necessarily be the same even for the same training data. #' #' @param data Data frame or matrix. Contains a column indicating whether each row is a presence (1) or background (0) site, plus columns for environmental predictors. #' @param resp Character or integer. Name or column index of response variable. Default is to use the first column in \code{data}. -#' @param preds Character list or integer list. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in \code{data}. +#' @param preds Character vector or integer vector. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in \code{data}. #' @param regMult Numeric vector. Values of the master regularization parameters (called \code{beta} in some publications) to test. -#' @param classes Character list. Names of feature classes to use (either \code{default} to use \code{lpqh}) or any combination of \code{lpqht}, where \code{l} ==> linear features, \code{p} ==> product features, \code{q} ==> quadratic features, \code{h} ==> hinge features, and \code{t} ==> threshold features. +#' @param classes Character vector. Names of feature classes to use (either \code{default} to use \code{'lpqh'}) or any combination of \code{'lpqht'}, where \code{l} ==> linear features, \code{p} ==> product features, \code{q} ==> quadratic features, \code{h} ==> hinge features, and \code{t} ==> threshold features. Example: \code{c('l', 'p', 'q')}. #' @param testClasses Logical. If \code{TRUE} (default) then test all possible combinations of classes (note that all tested models will at least have linear features). If \code{FALSE} then use the classes provided (these will not vary between models). #' @param dropOverparam Logical, if \code{TRUE} (default), drop models if they have more coefficients than training occurrences. It is possible for no models to fulfill this criterion, in which case no models will be returned. #' @param out Character vector. One or more values: diff --git a/R/trainNs.r b/R/trainNs.r index 50df2b9..f2d503c 100644 --- a/R/trainNs.r +++ b/R/trainNs.r @@ -4,7 +4,7 @@ #' #' @param data Data frame. #' @param resp Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response. -#' @param preds Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}. +#' @param preds Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}. #' @param scale Either \code{NA} (default), or \code{TRUE} or \code{FALSE}. If \code{TRUE}, the predictors will be centered and scaled by dividing by subtracting their means then dividing by their standard deviations. The means and standard deviations will be returned in the model object under an element named "\code{scales}". For example, if you do something like \code{model <- trainGLM(data, scale=TRUE)}, then you can get the means and standard deviations using \code{model$scales$means} and \code{model$scales$sds}. If \code{FALSE}, no scaling is done. If \code{NA} (default), then the function will check to see if non-factor predictors have means ~0 and standard deviations ~1. If not, then a warning will be printed, but the function will continue to do it's operations. #' @param method Character, name of function used to solve. This can be \code{'glm.fit'} (default), \code{'brglmFit'} (from the \pkg{brglm2} package), or another function. #' @param df A vector of integers > 0 or \code{NULL}. Sets flexibility of model fit. See documentation for \code{\link[splines]{ns}}. diff --git a/R/trainRf.r b/R/trainRf.r index c695ff0..9f2788e 100644 --- a/R/trainRf.r +++ b/R/trainRf.r @@ -6,7 +6,7 @@ #' #' @param data Data frame. #' @param resp Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response. -#' @param preds Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}. +#' @param preds Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}. #' @param binary Logical. If \code{TRUE} (default) then the response is converted to a binary factor with levels 0 and 1. Otherwise, this argument has no effect and the response will be assumed to be a real number. #' @param numTrees Vector of number of trees to grow. All possible combinations of \code{mtry} and \code{numTrees} will be assessed. #' @param mtryIncrement Positive integer (default is 2). Number of predictors to add to \code{mtry} until all predictors are in each tree. diff --git a/README.md b/README.md index de453a3..f7679b4 100644 --- a/README.md +++ b/README.md @@ -1,8 +1,8 @@ # enmSdmX -[![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) -[![cran version](https://www.r-pkg.org/badges/version/enmSdmX)](https://cran.r-project.org/package=enmSdmX) +![R](https://img.shields.io/badge/r-%23276DC3.svg?style=for-the-badge&logo=r&logoColor=white) [![Project Status: Active – The project has reached a stable, usable state and is being actively developed.](https://www.repostatus.org/badges/latest/active.svg)](https://www.repostatus.org/#active) [![cran version](https://www.r-pkg.org/badges/version/enmSdmX)](https://cran.r-project.org/package=enmSdmX) [![R-CMD-check](https://github.com/adamlilith/fasterRaster/actions/workflows/R-CMD-check.yaml/badge.svg)](https://github.com/adamlilith/fasterRaster/actions/workflows/R-CMD-check.yaml) + Tools for modeling niches and distributions of species diff --git a/enmSdmX_workspace.code-workspace b/enmSdmX_workspace.code-workspace new file mode 100644 index 0000000..f151917 --- /dev/null +++ b/enmSdmX_workspace.code-workspace @@ -0,0 +1,29 @@ +{ + "folders": [ + { + "path": "." + } + ], + "settings": { + "workbench.colorCustomizations": { + "activityBar.activeBackground": "#1f6fd0", + "activityBar.background": "#1f6fd0", + "activityBar.foreground": "#e7e7e7", + "activityBar.inactiveForeground": "#e7e7e799", + "activityBarBadge.background": "#ee90bb", + "activityBarBadge.foreground": "#15202b", + "commandCenter.border": "#e7e7e799", + "sash.hoverBorder": "#1f6fd0", + "statusBar.background": "#1857a4", + "statusBar.foreground": "#e7e7e7", + "statusBarItem.hoverBackground": "#1f6fd0", + "statusBarItem.remoteBackground": "#1857a4", + "statusBarItem.remoteForeground": "#e7e7e7", + "titleBar.activeBackground": "#1857a4", + "titleBar.activeForeground": "#e7e7e7", + "titleBar.inactiveBackground": "#1857a499", + "titleBar.inactiveForeground": "#e7e7e799" + }, + "peacock.color": "#1857a4" + } +} \ No newline at end of file diff --git a/man/compareResponse.Rd b/man/compareResponse.Rd index 2e92f41..1bb52d7 100644 --- a/man/compareResponse.Rd +++ b/man/compareResponse.Rd @@ -19,13 +19,13 @@ compareResponse( ) } \arguments{ -\item{pred1}{Numeric list. Predictions from first model along \code{data} (one value per row in \code{data}).} +\item{pred1}{Numeric vector. Predictions from first model along \code{data} (one value per row in \code{data}).} -\item{pred2}{Numeric list. Predictions from second model along \code{data} (one value per row in \code{data}).} +\item{pred2}{Numeric vector. Predictions from second model along \code{data} (one value per row in \code{data}).} \item{data}{Data frame or matrix corresponding to \code{pred1} and \code{pred2}.} -\item{predictor}{Character list. Name(s) of predictor(s) for which to calculate comparisons. These must appear as column names in \code{data}.} +\item{predictor}{Character vector. Name(s) of predictor(s) for which to calculate comparisons. These must appear as column names in \code{data}.} \item{adjust}{Logical. If \code{TRUE} then subtract the mean of \code{pred1} from \code{pred1} and the mean of \code{pred2} from \code{pred2} before analysis. Useful for comparing the shapes of curves while controlling for different elevations (intercepts).} @@ -35,7 +35,7 @@ compareResponse( \item{smoothN}{\code{NULL} or positive integer. Number of values along "pred" at which to calculate comparisons. Only used if \code{smooth} is \code{TRUE}. If \code{NULL}, then comparisons are calculated at each value in data. If a number, then comparisons are calculated at \code{smoothN} values of \code{data[ , pred]} that cover the range of \code{data[ , pred]}.} -\item{smoothRange}{2-element numeric list or \code{NULL}. If \code{smooth} is \code{TRUE}, then force loess predictions < \code{smoothRange[1]} to equal \code{smoothRange[1]} and predictions > \code{smoothRange[2]} to equal \code{smoothRange[2]}. Ignored if \code{NULL}.} +\item{smoothRange}{2-element numeric vector or \code{NULL}. If \code{smooth} is \code{TRUE}, then force loess predictions < \code{smoothRange[1]} to equal \code{smoothRange[1]} and predictions > \code{smoothRange[2]} to equal \code{smoothRange[2]}. Ignored if \code{NULL}.} \item{graph}{Logical. If \code{TRUE} then plot predictions.} diff --git a/man/dot-calcWeights.Rd b/man/dot-calcWeights.Rd index d6f487e..a191a33 100644 --- a/man/dot-calcWeights.Rd +++ b/man/dot-calcWeights.Rd @@ -7,7 +7,7 @@ .calcWeights(w, data, resp, family) } \arguments{ -\item{w}{Either logical in which case \code{TRUE} (default) causes the total weight of presences to equal the total weight of absences (if \code{family='binomial'}) \emph{or} a numeric list of weights, one per row in \code{data} \emph{or} the name of the column in \code{data} that contains site weights. If \code{FALSE}, then each datum gets a weight of 1.} +\item{w}{Either logical in which case \code{TRUE} (default) causes the total weight of presences to equal the total weight of absences (if \code{family='binomial'}) \emph{or} a numeric vector of weights, one per row in \code{data} \emph{or} the name of the column in \code{data} that contains site weights. If \code{FALSE}, then each datum gets a weight of 1.} \item{data}{Data frame} diff --git a/man/dot-cardinalDistance.Rd b/man/dot-cardinalDistance.Rd index fd4d6e7..d14ddd6 100644 --- a/man/dot-cardinalDistance.Rd +++ b/man/dot-cardinalDistance.Rd @@ -45,7 +45,7 @@ \item{x2weightedElev}{Matrix of elevations weighted by x2 or \code{NULL}.} } \value{ -a list object with distance moved and abundance of all cells north/south/east/west of reference point. +A list object with distance moved and abundance of all cells north/south/east/west of reference point. } \description{ This function calculates the weighted distance moved by a mass represented by set of cells which fall north, south, east, or west of a given location (i.e., typically the centroid of the starting population). Values >0 confer movement to the north, south, east, or west of this location. diff --git a/man/elimCellDuplicates.Rd b/man/elimCellDuplicates.Rd index f09251f..70a9f4a 100644 --- a/man/elimCellDuplicates.Rd +++ b/man/elimCellDuplicates.Rd @@ -11,9 +11,9 @@ elimCellDuplicates(x, rast, longLat = NULL, priority = NULL) \item{rast}{\code{SpatRaster} object.} -\item{longLat}{Two-element character list \emph{or} two-element integer list. If \code{x} is a \code{data.frame}, then this should be a character list specifying the names of the fields in \code{x} \emph{or} a two-element list of integers that correspond to longitude and latitude (in that order). For example, \code{c('long', 'lat')} or \code{c(1, 2)}. If \code{x} is a \code{matrix}, then this is a two-element list indicating the column numbers in \code{x} that represent longitude and latitude. For example, \code{c(1, 2)}. If \code{x} is an \code{sf} object then this is ignored.} +\item{longLat}{Two-element character vector \emph{or} two-element integer vector. If \code{x} is a \code{data.frame}, then this should be a character vector specifying the names of the fields in \code{x} \emph{or} a two-element vector of integers that correspond to longitude and latitude (in that order). For example, \code{c('long', 'lat')} or \code{c(1, 2)}. If \code{x} is a \code{matrix}, then this is a two-element vector indicating the column numbers in \code{x} that represent longitude and latitude. For example, \code{c(1, 2)}. If \code{x} is an \code{sf} object then this is ignored.} -\item{priority}{Either \code{NULL}, in which case for every cell with more than one point the first point in \code{x} is chosen, or a numeric or character list indicating preference for some points over others when points occur in the same cell. There should be the same number of elements in \code{priority} as there are points in \code{x}. Priority is assigned by the natural sort order of \code{priority}. For example, for 3 points in a cell for which \code{priority} is \code{c(2, 1, 3)}, the script will retain the second point and discard the rest. Similarly, if \code{priority} is \code{c('z', 'y', 'x')} then the third point will be chosen. Priorities assigned to points in other cells are ignored when thinning points in a particular cell.} +\item{priority}{Either \code{NULL}, in which case for every cell with more than one point the first point in \code{x} is chosen, or a numeric or character vector indicating preference for some points over others when points occur in the same cell. There should be the same number of elements in \code{priority} as there are points in \code{x}. Priority is assigned by the natural sort order of \code{priority}. For example, for 3 points in a cell for which \code{priority} is \code{c(2, 1, 3)}, the script will retain the second point and discard the rest. Similarly, if \code{priority} is \code{c('z', 'y', 'x')} then the third point will be chosen. Priorities assigned to points in other cells are ignored when thinning points in a particular cell.} } \value{ Object of class \code{x}. diff --git a/man/enmSdmX.Rd b/man/enmSdmX.Rd index a8a3175..8b66cae 100644 --- a/man/enmSdmX.Rd +++ b/man/enmSdmX.Rd @@ -117,7 +117,7 @@ Create an issue on \href{https://github.com/adamlilith/enmSdmX/issues}{GitHub}. Useful links: \itemize{ \item \url{https://github.com/adamlilith/enmSdmX} - \item Report bugs at \url{https://github.com/adamlilith/enmSdmX} + \item Report bugs at \url{https://github.com/adamlilith/enmSdmX/issues} } } diff --git a/man/nicheOverlapMetrics.Rd b/man/nicheOverlapMetrics.Rd index cfa0586..26cc7a7 100644 --- a/man/nicheOverlapMetrics.Rd +++ b/man/nicheOverlapMetrics.Rd @@ -31,7 +31,7 @@ nicheOverlapMetrics( \item \code{rankCor}: Spearman rank correlation. }} -\item{w}{Numeric list. Weights of predictions in \code{x1} and \code{x2}.} +\item{w}{Numeric vector. Weights of predictions in \code{x1} and \code{x2}.} \item{na.rm}{Logical. If T\code{TRUE} then remove elements in \code{x1} and \code{2} that are \code{NA} in \emph{either} \code{x1} or \code{x2}.} diff --git a/man/predictEnmSdm.Rd b/man/predictEnmSdm.Rd index 96a495a..eca78a9 100644 --- a/man/predictEnmSdm.Rd +++ b/man/predictEnmSdm.Rd @@ -22,7 +22,7 @@ predictEnmSdm( \item{maxentFun}{This argument is only used if the \code{model} object is a MaxEnt model; otherwise, it is ignored. It takes a value of either \code{'terra'}, in which case a MaxEnt model is predicted using the default \code{predict} function from the \pkg{terra} package, or \code{'enmSdmX'} in which case the function \code{\link[enmSdmX]{predictMaxEnt}} function from the \pkg{enmSdmX} package (this package) is used.} -\item{scale}{Logical. If the model is a GLM trained with \code{\link{trainGLM}}, you can use the \code{scale} argument in that function to center and scale the predictors. In the \code{predictEnmSdm} function, you can set \code{scale} to \code{TRUE} to scale the rasters or data frame to which you are training using the centers (means) and scales (standard deviations) used in the mode. Otherwise, it is up to you to ensure variables are properly centered and scaled. This argument only has effect if the model is a GLM trained using \code{\link{trainGLM}}.} +\item{scale}{Logical. If the model is a GLM trained with \code{\link{trainGLM}} or \code{\link{trainNS}}, you can use the \code{scale} argument in that function to center and scale the predictors. In the \code{predictEnmSdm} function, you can set \code{scale} to \code{TRUE} to scale the rasters or data frame to which you are training using the centers (means) and scales (standard deviations) used in the mode. Otherwise, it is up to you to ensure variables are properly centered and scaled. This argument only has effect if the model is a GLM trained using \code{\link{trainGLM}} or \code{\link{trainNS}}.} \item{cores}{Integer >= 1. Number of cores to use when calculating multiple models. Default is 1. This is forced to 1 if \code{newdata} is a \code{SpatRaster} (i.e., as of now, there is no parallelization when predicting to a raster... sorry!). If you have issues when \code{cores} > 1, please see the \code{\link{troubleshooting_parallel_operations}} guide.} diff --git a/man/predictMaxEnt.Rd b/man/predictMaxEnt.Rd index 54e2f7d..9a329b1 100644 --- a/man/predictMaxEnt.Rd +++ b/man/predictMaxEnt.Rd @@ -32,13 +32,13 @@ predictMaxEnt( \item{perm}{Character vector. Name(s) of variable to permute before calculating predictions. This permutes the variables for \emph{all} features in which they occur. If a variable is named here, it overrides permutation settings for each feature featType. Note that for product features the variable is permuted before the product is taken. This permutation is performed before any subsequent permutations (i.e., so if both variables in a product feature are included in \code{perms}, then this is equivalent to using the \code{'before'} rule for \code{permProdRule}). Ignored if \code{NULL}.} -\item{permLinear}{Character list. Names(s) of variables to permute in linear features before calculating predictions. Ignored if \code{NULL}.} +\item{permLinear}{Character vector. Names(s) of variables to permute in linear features before calculating predictions. Ignored if \code{NULL}.} \item{permQuad}{Names(s) of variables to permute in quadratic features before calculating predictions. Ignored if \code{NULL}.} \item{permHinge}{Character vector. Names(s) of variables to permute in forward/reverse hinge features before calculating predictions. Ignored if \code{NULL}.} -\item{permThresh}{Character list. Names(s) of variables to permute in threshold features before calculating predictions. Ignored if \code{NULL}.} +\item{permThresh}{Character vector. Names(s) of variables to permute in threshold features before calculating predictions. Ignored if \code{NULL}.} \item{permProd}{Character list. A list object of \code{n} elements, each of which has two character elements naming the variables to permute if they occur in a product feature. Depending on the value of \code{permProdRule}, the function will either permute the individual variables then calculate their product or calculate their product, then permute the product across observations. Any other features containing the variables will produce values as normal. Example: \code{permProd=list(c('precipWinter', 'tempWinter'), c('tempSummer', 'precipFall'))}. The order of the variables in each element of \code{permProd} doesn't matter, so \code{permProd=list(c('temp', 'precip'))} is the same as \code{permProd=list(c('precip', 'temp'))}. Ignored if \code{NULL}.} diff --git a/man/trainBrt.Rd b/man/trainBrt.Rd index 95cb884..aa8ca96 100644 --- a/man/trainBrt.Rd +++ b/man/trainBrt.Rd @@ -29,7 +29,7 @@ trainBRT( \item{resp}{Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response.} -\item{preds}{Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.} +\item{preds}{Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.} \item{learningRate}{Numeric. Learning rate at which model learns from successive trees (Elith et al. 2008 recommend 0.0001 to 0.1).} @@ -43,7 +43,7 @@ trainBRT( \item{tries}{Integer > 0. Number of times to try to train a model with a particular set of tuning parameters. The function will stop training the first time a model converges (usually on the first attempt). Non-convergence seems to be related to the number of trees tried in each step. So if non-convergence occurs then the function automatically increases the number of trees in the step size until \code{tries} is reached.} -\item{tryBy}{Character list. A list that contains one or more of \code{'learningRate'}, \code{'treeComplexity'}, \code{numTrees}, and/or \code{'stepSize'}. If a given combination of \code{learningRate}, \code{treeComplexity}, \code{numTrees}, \code{stepSize}, and \code{bagFraction} do not allow model convergence then then the function tries again but with alterations to any of the arguments named in \code{tryBy}: +\item{tryBy}{Character vector. A list that contains one or more of \code{'learningRate'}, \code{'treeComplexity'}, \code{numTrees}, and/or \code{'stepSize'}. If a given combination of \code{learningRate}, \code{treeComplexity}, \code{numTrees}, \code{stepSize}, and \code{bagFraction} do not allow model convergence then then the function tries again but with alterations to any of the arguments named in \code{tryBy}: * \code{learningRate}: Decrease the learning rate by a factor of 10. * \code{treeComplexity}: Randomly increase/decrease tree complexity by 1 (minimum of 1). * \code{maxTrees}: Increase number of trees by 20%. diff --git a/man/trainByCrossValid.Rd b/man/trainByCrossValid.Rd index 9d7b4eb..256244a 100644 --- a/man/trainByCrossValid.Rd +++ b/man/trainByCrossValid.Rd @@ -50,7 +50,7 @@ A list object with several named elements: \itemize{ \item \code{meta}: Meta-data on the model call. \item \code{folds}: The \code{folds} object. - \item \code{models} (if \code{outputModels} is \code{TRUE}): A list of model objects, one per data fold. + \item \code{models} (if \code{outputModels} is \code{TRUE}): A list of model objects, one per data fold. \item \code{tuning}: One data frame per k-fold, each containing evaluation statistics for all candidate models in the fold. In addition to algorithm-specific fields, these consist of: \itemize{ \item \code{'logLoss'}: Log loss. Higher (less negative) values imply better fit. diff --git a/man/trainGam.Rd b/man/trainGam.Rd index 39bfc7d..bfcd0fe 100644 --- a/man/trainGam.Rd +++ b/man/trainGam.Rd @@ -31,7 +31,7 @@ trainGAM( \item{resp}{Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response.} -\item{preds}{Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.} +\item{preds}{Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.} \item{gamma}{Initial penalty to degrees of freedom to use (larger ==> smoother fits).} diff --git a/man/trainGlm.Rd b/man/trainGlm.Rd index 31cd176..8a8136e 100644 --- a/man/trainGlm.Rd +++ b/man/trainGlm.Rd @@ -31,7 +31,7 @@ trainGLM( \item{resp}{Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response.} -\item{preds}{Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.} +\item{preds}{Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.} \item{scale}{Either \code{NA} (default), or \code{TRUE} or \code{FALSE}. If \code{TRUE}, the predictors will be centered and scaled by dividing by subtracting their means then dividing by their standard deviations. The means and standard deviations will be returned in the model object under an element named "\code{scales}". For example, if you do something like \code{model <- trainGLM(data, scale=TRUE)}, then you can get the means and standard deviations using \code{model$scales$mean} and \code{model$scales$sd}. If \code{FALSE}, no scaling is done. If \code{NA} (default), then the function will check to see if non-factor predictors have means ~0 and standard deviations ~1. If not, then a warning will be printed, but the function will continue to do its operations.} diff --git a/man/trainMaxEnt.Rd b/man/trainMaxEnt.Rd index 7ef7000..60589c2 100644 --- a/man/trainMaxEnt.Rd +++ b/man/trainMaxEnt.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/trainMaxEnt.r \name{trainMaxEnt} \alias{trainMaxEnt} -\title{Calibrate a MaxEnt (ver 3.3.3+ or "maxent") model using AICc} +\title{Calibrate a MaxEnt model using AICc} \usage{ trainMaxEnt( data, @@ -28,11 +28,11 @@ trainMaxEnt( \item{resp}{Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response.} -\item{preds}{Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.} +\item{preds}{Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.} \item{regMult}{Numeric vector. Values of the master regularization parameters (called \code{beta} in some publications) to test.} -\item{classes}{Character list. Names of feature classes to use (either \code{default} to use \code{lpqh}) or any combination of \code{lpqht}, where \code{l} ==> linear features, \code{p} ==> product features, \code{q} ==> quadratic features, \code{h} ==> hinge features, and \code{t} ==> threshold features.} +\item{classes}{Character vector. Names of feature classes to use (either \code{default} to use \code{lpqh}) or any combination of \code{lpqht}, where \code{l} ==> linear features, \code{p} ==> product features, \code{q} ==> quadratic features, \code{h} ==> hinge features, and \code{t} ==> threshold features. Example: \code{c('l', 'p', 'q')}.} \item{testClasses}{Logical. If \code{TRUE} (default) then test all possible combinations of classes (note that all tested models will at least have linear features). If \code{FALSE} then use the classes provided (these will not vary between models).} @@ -44,7 +44,7 @@ trainMaxEnt( \item{jackknife}{Logical. If \code{TRUE} (default) the the returned model will be also include jackknife testing of variable importance.} -\item{arguments}{\code{NULL} (default) or a character list. Options to pass to \code{maxent()}'s \code{args} argument. (Do not include \code{l}, \code{p}, \code{q}, \code{h}, \code{t}, \code{betamultiplier}, or \code{jackknife}!)} +\item{arguments}{\code{NULL} (default) or a character vector. Options to pass to \code{maxent()}'s \code{args} argument. (Do not include \code{l}, \code{p}, \code{q}, \code{h}, \code{t}, \code{betamultiplier}, or \code{jackknife}!)} \item{scratchDir}{Character. Directory to which to write temporary files. Leave as NULL to create a temporary folder in the current working directory.} @@ -65,9 +65,13 @@ trainMaxEnt( The object that is returned depends on the value of the \code{out} argument. It can be a model object, a data frame, a list of models, or a list of all two or more of these. } \description{ -This function calculates the "best" Maxent model using AICc across all possible combinations of a set of master regularization parameters and feature classes. The best model has the lowest AICc, with ties broken by number of features (fewer is better), regularization multiplier (higher better), then finally the number of coefficients (fewer better). The function can return the best model (default), a list of models created using all possible combinations of feature classes and regularization multipliers, and/or a data frame with tuning statistics for each model. Models in the list and in the data frame are sorted from best to worst. The function requires the \code{maxent} jar file (see \emph{Details}). Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc. +This function calculates the "best" MaxEnt model using AICc across all possible combinations of a set of master regularization parameters and feature classes. The best model has the lowest AICc, with ties broken by number of features (fewer is better), regularization multiplier (higher better), then finally the number of coefficients (fewer better). } \details{ +The function can return the best model (default), a list of models created using all possible combinations of feature classes and regularization multipliers, and/or a data frame with tuning statistics for each model. Models in the list and in the data frame are sorted from best to worst. The function requires the \code{maxent} jar file (see \emph{Details}). Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc. + +Note that due to differences in how MaxEnt and MaxNet are implemented in their base packages, the models will not necessarily be the same even for the same training data. + This function is a wrapper for \code{MaxEnt()}. The \code{MaxEnt} function creates a series of files on disk for each model. This function assumes you do not want those files, so deletes most of them. However, there is one that cannot be deleted and the normal ways of changing its permissions in \pkg{R} do not work. So the function simply writes over that file (which is allowed) to make it smaller. Regardless, if you run many models your temporary directory (argument \code{scratchDir}) can fill up and require manual deletion. } \examples{ diff --git a/man/trainMaxNet.Rd b/man/trainMaxNet.Rd index fe2d820..c5d397a 100644 --- a/man/trainMaxNet.Rd +++ b/man/trainMaxNet.Rd @@ -2,7 +2,7 @@ % Please edit documentation in R/trainMaxNet.r \name{trainMaxNet} \alias{trainMaxNet} -\title{Calibrate a MaxNet (MaxEnt) model using AICc} +\title{Calibrate a MaxNet model using AICc} \usage{ trainMaxNet( data, @@ -24,11 +24,11 @@ trainMaxNet( \item{resp}{Character or integer. Name or column index of response variable. Default is to use the first column in \code{data}.} -\item{preds}{Character list or integer list. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in \code{data}.} +\item{preds}{Character vector or integer vector. Names of columns or column indices of predictors. Default is to use the second and subsequent columns in \code{data}.} \item{regMult}{Numeric vector. Values of the master regularization parameters (called \code{beta} in some publications) to test.} -\item{classes}{Character list. Names of feature classes to use (either \code{default} to use \code{lpqh}) or any combination of \code{lpqht}, where \code{l} ==> linear features, \code{p} ==> product features, \code{q} ==> quadratic features, \code{h} ==> hinge features, and \code{t} ==> threshold features.} +\item{classes}{Character vector. Names of feature classes to use (either \code{default} to use \code{'lpqh'}) or any combination of \code{'lpqht'}, where \code{l} ==> linear features, \code{p} ==> product features, \code{q} ==> quadratic features, \code{h} ==> hinge features, and \code{t} ==> threshold features. Example: \code{c('l', 'p', 'q')}.} \item{testClasses}{Logical. If \code{TRUE} (default) then test all possible combinations of classes (note that all tested models will at least have linear features). If \code{FALSE} then use the classes provided (these will not vary between models).} @@ -53,7 +53,12 @@ trainMaxNet( If \code{out = 'model'} this function returns an object of class \code{MaxEnt}. If \code{out = 'tuning'} this function returns a data frame with tuning parameters, log likelihood, and AICc for each model tried. If \code{out = c('model', 'tuning'} then it returns a list object with the \code{MaxEnt} object and the data frame. } \description{ -This function calculates the "best" MaxNet model using AICc across all possible combinations of a set of master regularization parameters and feature classes. The "best" model has the lowest AICc, with ties broken by number of features (fewer is better), regularization multiplier (higher better), then finally the number of coefficients (fewer better). The function can return the best model (default), a list of models created using all possible combinations of feature classes and regularization multipliers, and/or a data frame with tuning statistics for each model. Models in the list and in the data frame are sorted from best to worst. Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc. +This function calculates the "best" MaxNet model using AICc across all possible combinations of a set of master regularization parameters and feature classes. The "best" model has the lowest AICc, with ties broken by number of features (fewer is better), regularization multiplier (higher better), then finally the number of coefficients (fewer better). +} +\details{ +The function can return the best model (default), a list of models created using all possible combinations of feature classes and regularization multipliers, and/or a data frame with tuning statistics for each model. Models in the list and in the data frame are sorted from best to worst. Its output is any or all of: a table with AICc for all evaluated models; all models evaluated in the "selection" phase; and/or the single model with the lowest AICc. + +Note that due to differences in how MaxEnt and MaxNet are implemented in their base packages, the models will not necessarily be the same even for the same training data. } \examples{ \donttest{ diff --git a/man/trainNs.Rd b/man/trainNs.Rd index 621b9d7..6512653 100644 --- a/man/trainNs.Rd +++ b/man/trainNs.Rd @@ -28,7 +28,7 @@ trainNS( \item{resp}{Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response.} -\item{preds}{Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.} +\item{preds}{Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.} \item{scale}{Either \code{NA} (default), or \code{TRUE} or \code{FALSE}. If \code{TRUE}, the predictors will be centered and scaled by dividing by subtracting their means then dividing by their standard deviations. The means and standard deviations will be returned in the model object under an element named "\code{scales}". For example, if you do something like \code{model <- trainGLM(data, scale=TRUE)}, then you can get the means and standard deviations using \code{model$scales$means} and \code{model$scales$sds}. If \code{FALSE}, no scaling is done. If \code{NA} (default), then the function will check to see if non-factor predictors have means ~0 and standard deviations ~1. If not, then a warning will be printed, but the function will continue to do it's operations.} diff --git a/man/trainRf.Rd b/man/trainRf.Rd index 183c2a4..2f1d2d6 100644 --- a/man/trainRf.Rd +++ b/man/trainRf.Rd @@ -23,7 +23,7 @@ trainRF( \item{resp}{Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response.} -\item{preds}{Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.} +\item{preds}{Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.} \item{numTrees}{Vector of number of trees to grow. All possible combinations of \code{mtry} and \code{numTrees} will be assessed.}