Fit and validate Maximum Entropy Models based on Ensemble of Small of Model approach
Source:R/esm_max.R
esm_max.Rd
This function constructs Maxent Models using the Ensemble of Small Model (ESM) approach (Breiner et al., 2015, 2018).
Usage
esm_max(
data,
response,
predictors,
partition,
thr = NULL,
background = NULL,
clamp = TRUE,
classes = "default",
pred_type = "cloglog",
regmult = 2.5
)
Arguments
- data
data.frame. Database with the response (0,1) and predictors values.
- response
character. Column name with species absence-presence data (0,1)
- predictors
character. Vector with the column names of quantitative predictor variables (i.e. continuous variables). This function can only construct models with continuous variables, and does not allow categorical variables Usage predictors = c("aet", "cwd", "tmin").
- partition
character. Column name with training and validation partition groups.
- thr
character. Threshold used to get binary suitability values (i.e. 0,1). It is useful for threshold-dependent performance metrics. It is possible to use more than one threshold type. It is necessary to provide a vector for this argument. The following threshold criteria are available:
equal_sens_spec: Threshold at which the sensitivity and specificity are equal.
max_sens_spec: Threshold at which the sum of the sensitivity and specificity is the highest (aka threshold that maximizes the TSS).
max_jaccard: The threshold at which Jaccard is the highest.
max_sorensen: The threshold at which Sorensen is highest.
max_fpb: The threshold at which FPB (F-measure on presence-background data) is highest.
sensitivity: Threshold based on a specified sensitivity value. Usage thr = c('sensitivity', sens='0.6') or thr = c('sensitivity'). 'sens' refers to sensitivity value. If no sensitivity value is specified, the default is 0.9
If the user wants to include more than one threshold type, it is necessary concatenate threshold types, e.g., thr=c('max_sens_spec', 'max_jaccard'), or thr=c('max_sens_spec', 'sensitivity', sens='0.8'), or thr=c('max_sens_spec', 'sensitivity'). Function will use all thresholds if no threshold is specified.
- background
data.frame. Database with response column only with 0 and predictors variables. All column names must be consistent with data. Default NULL
- clamp
logical. It is set with TRUE, predictors and features are restricted to the range seen during model training.
- classes
character. A single feature of any combinations of them. Features are symbolized by letters: l (linear), q (quadratic), h (hinge), p (product), and t (threshold). Usage classes = "lpq". Default "default" (see details).
- pred_type
character. Type of response required available "link", "exponential", "cloglog" and "logistic". Default "cloglog"
- regmult
numeric. A constant to adjust regularization. Because ESM are used for modeling species with few records default value is 2.5
Value
A list object with:
esm_model: A list with "maxnet" class object from maxnet package for each bivariate model. This object can be used for predicting ensembles of small models with
sdm_predict
function.predictors: A tibble with variables use for modeling.
performance: Performance metrics (see
sdm_eval
). Those threshold dependent metric are calculated based on the threshold specified in the argument.
Details
This method consists of creating bivariate models with all the pair-wise combinations of predictors and perform an ensemble based on the average of suitability weighted by Somers' D metric (D = 2 x (AUC -0.5)). ESM is recommended for modeling species with few occurrences. This function does not allow categorical variables because the use of these types of variables could be problematic when using with few occurrences. For further detail see Breiner et al. (2015, 2018). This function use a default regularization multiplier equal to 2.5 (see Breiner et al., 2018)
When the argument “classes” is set as default MaxEnt will use different features combination depending of the number of presences (np) with the follow rule: if np < 10 classes = "l", if np between 10 and 15 classes = "lq", if np between 15 and 80 classes = "lqh", and if np >= 80 classes = "lqph"
When presence-absence (or presence-pseudo-absence) data are used in data argument in addition to background points, the function will fit models with presences and background points and validate with presences and absences. This procedure makes maxent comparable to other presences-absences models (e.g., random forest, support vector machine). If only presences and background points data are used, function will fit and validate model with presences and background data. If only presence-absences are used in data argument and without background, function will fit model with the specified data (not recommended).
References
Breiner, F. T., Guisan, A., Bergamini, A., & Nobis, M. P. (2015). Overcoming limitations of modelling rare species by using ensembles of small models. Methods in Ecology and Evolution, 6(10), 1210-218. https://doi.org/10.1111/2041-210X.12403
Breiner, F. T., Nobis, M. P., Bergamini, A., & Guisan, A. (2018). Optimizing ensembles of small models for predicting the distribution of species with few occurrences. Methods in Ecology and Evolution, 9(4), 802-808. https://doi.org/10.1111/2041-210X.12957
Examples
if (FALSE) { # \dontrun{
data("abies")
data("backg")
require(dplyr)
# Using k-fold partition method
set.seed(10)
abies2 <- abies %>%
na.omit() %>%
group_by(pr_ab) %>%
dplyr::slice_sample(n = 10) %>%
group_by()
abies2 <- part_random(
data = abies2,
pr_ab = "pr_ab",
method = c(method = "rep_kfold", folds = 5, replicates = 5)
)
abies2
set.seed(10)
backg2 <- backg %>%
na.omit() %>%
group_by(pr_ab) %>%
dplyr::slice_sample(n = 100) %>%
group_by()
backg2 <- part_random(
data = backg2,
pr_ab = "pr_ab",
method = c(method = "rep_kfold", folds = 5, replicates = 5)
)
backg2
# Without threshold specification and with kfold
esm_max_t1 <- esm_max(
data = abies2,
response = "pr_ab",
predictors = c("aet", "cwd", "tmin", "ppt_djf", "ppt_jja", "pH", "awc", "depth"),
partition = ".part",
thr = NULL,
background = backg2,
clamp = TRUE,
classes = "default",
pred_type = "cloglog",
regmult = 1
)
esm_max_t1$esm_model # bivariate model
esm_max_t1$predictors
esm_max_t1$performance
} # }