Merge pull request #46 from adamlilith/solstice_2022_2023

Clarify help ("`list`" --> "`vector`" where needed)
adamlilith · Nov 2, 2024 · 7cbc387 · 7cbc387
2 parents 0148f00 + 83d0c29
commit 7cbc387
Show file tree

Hide file tree

Showing 37 changed files with 128 additions and 66 deletions.
diff --git a/.Rbuildignore b/.Rbuildignore
@@ -8,3 +8,4 @@
 ^\.Rproj\.user$
 ^\.github
 ^\.github$
+^enmSdmX_workspace.code-workspace
diff --git a/DESCRIPTION b/DESCRIPTION
@@ -1,8 +1,8 @@
 Package: enmSdmX
 Type: Package
 Title: Species Distribution Modeling and Ecological Niche Modeling
-Version: 1.1.6
-Date: 2024-06-13
+Version: 1.1.8
+Date: 2024-10-02
 Authors@R: 
 	c(
 		person(
@@ -57,7 +57,7 @@ Imports:
 LazyData: true
 LazyLoad: yes
 URL: https://github.com/adamlilith/enmSdmX
-BugReports: https://github.com/adamlilith/enmSdmX
+BugReports: https://github.com/adamlilith/enmSdmX/issues
 Encoding: UTF-8
 License: MIT + file LICENSE
-RoxygenNote: 7.3.1
+RoxygenNote: 7.3.2
diff --git a/NEWS.md b/NEWS.md
@@ -1,13 +1,19 @@
+# enmSdmX 1.1.8 2024-10-02
+- `modelSize()` can now tell size of a `ranger` random forest.
+
+# enmSdmX 1.1.7 2024-08-02
+- Clarified mis-leading help in several functions, including all the `trainXYZ()` functions (thank you, PT!).
+
 # enmSdmX 1.1.6 2024-06-06
 - Replaced dependency on **MuMIn** with one one **AICcmodavg** for calculation of AICc. Received warning that **MuMIn** was going to be archived on CRAN.
 
 # enmSdmX 1.1.5 2024-05-16
 - Added function `trainESM()` for ensembles of small models.
 - Added several UTM coordinate reference systems accessible through `getCRS()`.
-- Fixed bug in `precidtEnmSdm()` for predicting kernel density models from the **ks** package.
+- Fixed bug in `predictEnmSdm()` for predicting kernel density models from the **ks** package.
 
 # enmSdmX 1.1.3 2023-03-06
-- `trainGLM()`,  `trainNS()`, and `predictEnmSdm()` now have options to automatically center and scale predictors.
+- `trainGLM()`,  `trainNS()`, and `predictEnmSdm()` now have options to automatically center and scale predictors. If a GLM or NS model created using `trainGLM()` or `trainSN()` is provided to `predictEnmSdm()`, it will automatically scale the predictors properly.
 
 # enmSdmX 1.1.3 2023-02-02
 - Removed dependency on `dismo`, replaced where possible by `predicts`; copied `gbm.step()` and `predict()` method for MaxEnt to `enmSdmX` as a momentary fix; would love a professional solution!

diff --git a/R/bioticVelocity.r b/R/bioticVelocity.r
@@ -939,7 +939,7 @@ bioticVelocity <- function(
 #' @param x2weightedLats Matrix of latitudes weighted (i.e., by population size, given by \code{x2}).
 #' @param x1weightedElev Matrix of elevations weighted by x1 or \code{NULL}.
 #' @param x2weightedElev Matrix of elevations weighted by x2 or \code{NULL}.
-#' @return a list object with distance moved and abundance of all cells north/south/east/west of reference point.
+#' @returns A list object with distance moved and abundance of all cells north/south/east/west of reference point.
 #' @keywords internal
 .cardinalDistance <- function(
 	direction,

diff --git a/R/compareResponse.r b/R/compareResponse.r
@@ -2,15 +2,15 @@
 #'
 #' This function calculates a suite of metrics reflecting of niche overlap for two response curves. Response curves are predicted responses of a uni- or multivariate model along a single variable. Depending on the user-specified settings the function calculates these values either at each pair of values of \code{pred1} and \code{pred2} \emph{or} along a smoothed version of \code{pred1} and \code{pred2}.
 #'
-#' @param pred1 Numeric list. Predictions from first model along \code{data} (one value per row in \code{data}).
-#' @param pred2 Numeric list. Predictions from second model along \code{data} (one value per row in \code{data}).
+#' @param pred1 Numeric vector. Predictions from first model along \code{data} (one value per row in \code{data}).
+#' @param pred2 Numeric vector. Predictions from second model along \code{data} (one value per row in \code{data}).
 #' @param data Data frame or matrix corresponding to \code{pred1} and \code{pred2}.
-#' @param predictor Character list. Name(s) of predictor(s) for which to calculate comparisons. These must appear as column names in \code{data}.
+#' @param predictor Character vector. Name(s) of predictor(s) for which to calculate comparisons. These must appear as column names in \code{data}.
 #' @param adjust Logical. If \code{TRUE} then subtract the mean of \code{pred1} from \code{pred1} and the mean of \code{pred2} from \code{pred2} before analysis. Useful for comparing the shapes of curves while controlling for different elevations (intercepts).
 #' @param gap Numeric >0. Proportion of range of predictor variable across which to assume a gap exists. Calculation of \code{areaAbsDiff} will  ignore gaps wide than this. To ensure the entire range of the data is included set this equal to \code{Inf} (default).
 #' @param smooth Logical. If \code{TRUE} then the responses are first smoothed using loess() then compared at \code{smoothN} values along each predictor. If \code{FALSE}, then comparisons are conducted at the raw values \code{pred1} and \code{pred2}.
 #' @param smoothN \code{NULL} or positive integer. Number of values along "pred" at which to calculate comparisons. Only used if \code{smooth} is \code{TRUE}. If \code{NULL}, then comparisons are calculated at each value in data. If a number, then comparisons are calculated at \code{smoothN} values of \code{data[ , pred]} that cover the range of \code{data[ , pred]}.
-#' @param smoothRange 2-element numeric list or \code{NULL}. If \code{smooth} is \code{TRUE}, then force loess predictions < \code{smoothRange[1]} to equal \code{smoothRange[1]} and predictions > \code{smoothRange[2]} to equal \code{smoothRange[2]}. Ignored if \code{NULL}.
+#' @param smoothRange 2-element numeric vector or \code{NULL}. If \code{smooth} is \code{TRUE}, then force loess predictions < \code{smoothRange[1]} to equal \code{smoothRange[1]} and predictions > \code{smoothRange[2]} to equal \code{smoothRange[2]}. Ignored if \code{NULL}.
 #' @param graph Logical. If \code{TRUE} then plot predictions.
 #' @param ... Arguments to pass to functions like \code{sum()} (for example, \code{na.rm=TRUE}) and to \code{overlap()} (for example, \code{w} for weights). Note that if \code{smooth = TRUE}, then passing an argument called \code{w} will likely cause a warning and make results circumspect \emph{unless} weights are pre-calculated for each of the \code{smoothN} points along a particular predictor.
 #' @return Either a data frame (if \code{smooth = FALSE} or a list object with the smooth model plus a data frame (if \code{smooth = TRUE}) . The data frame represents metrics comparing response curves of \code{pred1} and \code{pred2}:

diff --git a/R/elimCellDuplicates.r b/R/elimCellDuplicates.r
@@ -4,8 +4,8 @@
 #'
 #' @param x Points. This can be either a \code{data.frame}, \code{matrix}, \code{SpatVector}, or \code{sf} object.
 #' @param rast \code{SpatRaster} object.
-#' @param longLat Two-element character list \emph{or} two-element integer list. If \code{x} is a \code{data.frame}, then this should be a character list specifying the names of the fields in \code{x} \emph{or} a two-element list of integers that correspond to longitude and latitude (in that order). For example, \code{c('long', 'lat')} or \code{c(1, 2)}. If \code{x} is a \code{matrix}, then this is a two-element list indicating the column numbers in \code{x} that represent longitude and latitude. For example, \code{c(1, 2)}. If \code{x} is an \code{sf} object then this is ignored.
-#' @param priority Either \code{NULL}, in which case for every cell with more than one point the first point in \code{x} is chosen, or a numeric or character list indicating preference for some points over others when points occur in the same cell. There should be the same number of elements in \code{priority} as there are points in \code{x}. Priority is assigned by the natural sort order of \code{priority}. For example, for 3 points in a cell for which \code{priority} is \code{c(2, 1, 3)}, the script will retain the second point and discard the rest. Similarly, if \code{priority} is \code{c('z', 'y', 'x')} then the third point will be chosen. Priorities assigned to points in other cells are ignored when thinning points in a particular cell.
+#' @param longLat Two-element character vector \emph{or} two-element integer vector. If \code{x} is a \code{data.frame}, then this should be a character vector specifying the names of the fields in \code{x} \emph{or} a two-element vector of integers that correspond to longitude and latitude (in that order). For example, \code{c('long', 'lat')} or \code{c(1, 2)}. If \code{x} is a \code{matrix}, then this is a two-element vector indicating the column numbers in \code{x} that represent longitude and latitude. For example, \code{c(1, 2)}. If \code{x} is an \code{sf} object then this is ignored.
+#' @param priority Either \code{NULL}, in which case for every cell with more than one point the first point in \code{x} is chosen, or a numeric or character vector indicating preference for some points over others when points occur in the same cell. There should be the same number of elements in \code{priority} as there are points in \code{x}. Priority is assigned by the natural sort order of \code{priority}. For example, for 3 points in a cell for which \code{priority} is \code{c(2, 1, 3)}, the script will retain the second point and discard the rest. Similarly, if \code{priority} is \code{c('z', 'y', 'x')} then the third point will be chosen. Priorities assigned to points in other cells are ignored when thinning points in a particular cell.
 #' @return Object of class \code{x}.
 #' @examples
 #'

diff --git a/R/modelSize.r b/R/modelSize.r
@@ -58,6 +58,11 @@ modelSize <- function(
 		NA
 		warning('Cannot determine number of presences and background sites for a MaxNet model.')
 
+	# random forest in ranger package
+	} else if (inherits(x, 'ranger')) {
+
+		as.numeric(x$predictions)
+
 	# random forest in party package
 	} else if (inherits(x, 'randomForest')) {
 
@@ -83,7 +88,11 @@ modelSize <- function(
 			stop('Cannot extract sample size from model object.')
 		}
 	} else if (binary) {
-		out <- c(sum(samples == 1), sum(samples == 0))
+		out <- if (inherits(x, 'ranger')) {
+			c(sum(samples == 1), sum(samples == 2))
+		} else {
+			c(sum(samples == 1), sum(samples == 0))
+		}
 		names(out) <- c('num1s', 'num0s')
 		if (out[1L] == 0 & out[2L] == 0) warning('Model does not seem to be using a binary response.', .immediate=TRUE)
 	} else {

diff --git a/R/nicheOverlapMetrics.r b/R/nicheOverlapMetrics.r
@@ -15,7 +15,7 @@
 #' \item \code{cor}: Pearson correlation between \code{x1} and \code{x2} (will apply \code{logitAdj()} first unless logit=FALSE).
 #' \item \code{rankCor}: Spearman rank correlation.
 #' }
-#' @param w Numeric list. Weights of predictions in \code{x1} and \code{x2}.
+#' @param w Numeric vector. Weights of predictions in \code{x1} and \code{x2}.
 #' @param na.rm Logical.  If T\code{TRUE} then remove elements in \code{x1} and \code{2} that are \code{NA} in \emph{either} \code{x1} or \code{x2}.
 #' @param ... Other arguments (not used).
 #' 

diff --git a/R/predictEnmSdm.r b/R/predictEnmSdm.r
@@ -8,7 +8,7 @@
 #'
 #' @param maxentFun	This argument is only used if the \code{model} object is a MaxEnt model; otherwise, it is ignored. It takes a value of either \code{'terra'}, in which case a MaxEnt model is predicted using the default \code{predict} function from the \pkg{terra} package, or \code{'enmSdmX'} in which case the function \code{\link[enmSdmX]{predictMaxEnt}} function from the \pkg{enmSdmX} package (this package) is used.
 #'
-#' @param scale Logical. If the model is a GLM trained with \code{\link{trainGLM}}, you can use the \code{scale} argument in that function to center and scale the predictors. In the \code{predictEnmSdm} function, you can set \code{scale} to \code{TRUE} to scale the rasters or data frame to which you are training using the centers (means) and scales (standard deviations) used in the mode. Otherwise, it is up to you to ensure variables are properly centered and scaled. This argument only has effect if the model is a GLM trained using \code{\link{trainGLM}}.
+#' @param scale Logical. If the model is a GLM trained with \code{\link{trainGLM}} or \code{\link{trainNS}}, you can use the \code{scale} argument in that function to center and scale the predictors. In the \code{predictEnmSdm} function, you can set \code{scale} to \code{TRUE} to scale the rasters or data frame to which you are training using the centers (means) and scales (standard deviations) used in the mode. Otherwise, it is up to you to ensure variables are properly centered and scaled. This argument only has effect if the model is a GLM trained using \code{\link{trainGLM}} or \code{\link{trainNS}}.
 #'
 #' @param cores	 Integer >= 1. Number of cores to use when calculating multiple models. Default is 1. This is forced to 1 if \code{newdata} is a \code{SpatRaster} (i.e., as of now, there is no parallelization when predicting to a raster... sorry!).  If you have issues when \code{cores} > 1, please see the \code{\link{troubleshooting_parallel_operations}} guide.
 #'

diff --git a/R/predictMaxEnt.r b/R/predictMaxEnt.r
@@ -10,10 +10,10 @@
 #' 		\item \code{'cloglog'} Complementary log-log output (as per version 3.4.0+ of maxent--called "\code{maxnet()}" in the package of the same name)
 #' }
 #' @param perm Character vector. Name(s) of variable to permute before calculating predictions. This permutes the variables for \emph{all} features in which they occur.  If a variable is named here, it overrides permutation settings for each feature featType.  Note that for product features the variable is permuted before the product is taken. This permutation is performed before any subsequent permutations (i.e., so if both variables in a product feature are included in \code{perms}, then this is equivalent to using the \code{'before'} rule for \code{permProdRule}). Ignored if \code{NULL}.
-#' @param permLinear Character list. Names(s) of variables to permute in linear features before calculating predictions.  Ignored if \code{NULL}.
+#' @param permLinear Character vector. Names(s) of variables to permute in linear features before calculating predictions.  Ignored if \code{NULL}.
 #' @param permQuad Names(s) of variables to permute in quadratic features before calculating predictions.  Ignored if \code{NULL}.
 #' @param permHinge Character vector. Names(s) of variables to permute in forward/reverse hinge features before calculating predictions.  Ignored if \code{NULL}.
-#' @param permThresh Character list. Names(s) of variables to permute in threshold features before calculating predictions.  Ignored if \code{NULL}.
+#' @param permThresh Character vector. Names(s) of variables to permute in threshold features before calculating predictions.  Ignored if \code{NULL}.
 #' @param permProd Character list. A list object of \code{n} elements, each of which has two character elements naming the variables to permute if they occur in a product feature.  Depending on the value of \code{permProdRule}, the function will either permute the individual variables then calculate their product or calculate their product, then permute the product across observations.  Any other features containing the variables will produce values as normal.  Example: \code{permProd=list(c('precipWinter', 'tempWinter'), c('tempSummer', 'precipFall'))}.  The order of the variables in each element of \code{permProd} doesn't matter, so \code{permProd=list(c('temp', 'precip'))} is the same as \code{permProd=list(c('precip', 'temp'))}.  Ignored if \code{NULL}.
 #' @param permProdRule Character. Rule for how permutation of product features is applied: \code{'before'} ==> Permute individual variable values then calculate product; \code{'after'} ==> calculate product then permute across these values. Ignored if \code{permProd} is \code{NULL}.
 #' @param ... Extra arguments (not used).

diff --git a/R/private_calcWeights.r b/R/private_calcWeights.r
@@ -2,7 +2,7 @@
 #'
 #' Calculates weighting for a model. Each record receives a numeric weight.
 #'
-#' @param w Either logical in which case \code{TRUE} (default) causes the total weight of presences to equal the total weight of absences (if \code{family='binomial'}) \emph{or} a numeric list of weights, one per row in \code{data} \emph{or} the name of the column in \code{data} that contains site weights. If \code{FALSE}, then each datum gets a weight of 1.
+#' @param w Either logical in which case \code{TRUE} (default) causes the total weight of presences to equal the total weight of absences (if \code{family='binomial'}) \emph{or} a numeric vector of weights, one per row in \code{data} \emph{or} the name of the column in \code{data} that contains site weights. If \code{FALSE}, then each datum gets a weight of 1.
 #' @param data Data frame
 #' @param resp Name of response column
 #' @param family Name of family

diff --git a/R/trainBrt.r b/R/trainBrt.r
@@ -4,15 +4,15 @@
 #'
 #' @param data Data frame.
 #' @param resp Response variable. This is either the name of the column in \code{data} or an integer indicating the column in \code{data} that has the response variable. The default is to use the first column in \code{data} as the response.
-#' @param preds Character list or integer list. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.
+#' @param preds Character vector or integer vector. Names of columns or column indices of predictors. The default is to use the second and subsequent columns in \code{data}.
 #' @param family Character. Name of error family.
 #' @param learningRate Numeric. Learning rate at which model learns from successive trees (Elith et al. 2008 recommend 0.0001 to 0.1).
 #' @param treeComplexity Positive integer. Tree complexity: depth of branches in a single tree (1 to 16).
 #' @param bagFraction Numeric in the range [0, 1]. Bag fraction: proportion of data used for training in cross-validation (Elith et al. 2008 recommend 0.5 to 0.7).
 #' @param minTrees Positive integer. Minimum number of trees to be scored as a "usable" model (Elith et al. 2008 recommend at least 1000). Default is 1000.
 #' @param maxTrees Positive integer. Maximum number of trees in model set.
 #' @param tries Integer > 0. Number of times to try to train a model with a particular set of tuning parameters. The function will stop training the first time a model converges (usually on the first attempt). Non-convergence seems to be related to the number of trees tried in each step.  So if non-convergence occurs then the function automatically increases the number of trees in the step size until \code{tries} is reached.
-#' @param tryBy Character list. A list that contains one or more of \code{'learningRate'}, \code{'treeComplexity'}, \code{numTrees}, and/or \code{'stepSize'}. If a given combination of \code{learningRate}, \code{treeComplexity}, \code{numTrees}, \code{stepSize}, and \code{bagFraction} do not allow model convergence then then the function tries again but with alterations to any of the arguments named in \code{tryBy}:
+#' @param tryBy Character vector. A list that contains one or more of \code{'learningRate'}, \code{'treeComplexity'}, \code{numTrees}, and/or \code{'stepSize'}. If a given combination of \code{learningRate}, \code{treeComplexity}, \code{numTrees}, \code{stepSize}, and \code{bagFraction} do not allow model convergence then then the function tries again but with alterations to any of the arguments named in \code{tryBy}:
 #' * \code{learningRate}: Decrease the learning rate by a factor of 10.
 #' * \code{treeComplexity}: Randomly increase/decrease tree complexity by 1 (minimum of 1).
 #' * \code{maxTrees}: Increase number of trees by 20%.

diff --git a/R/trainByCrossValid.r b/R/trainByCrossValid.r
@@ -22,7 +22,7 @@
 #' \itemize{
 #' 		\item \code{meta}: Meta-data on the model call.
 #' 		\item \code{folds}: The \code{folds} object.
-#' 		\item \code{models} (if \code{outputModels} is \code{TRUE}): A list of model objects, one per  data fold.
+#' 		\item \code{models} (if \code{outputModels} is \code{TRUE}): A list of model objects, one per data fold.
 #'		\item \code{tuning}: One data frame per k-fold, each containing evaluation statistics for all candidate models in the fold. In addition to algorithm-specific fields, these consist of:
 #' 	\itemize{
 #' 		\item \code{'logLoss'}: Log loss. Higher (less negative) values imply better fit.