Fit and validate Maximum Entropy Models based on Ensemble of Small of Model approach

This function constructs Maxent Models using the Ensemble of Small Model (ESM) approach (Breiner et al., 2015, 2018).

Usage

esm_max(
  data,
  response,
  predictors,
  partition,
  thr = NULL,
  background = NULL,
  clamp = TRUE,
  classes = "default",
  pred_type = "cloglog",
  regmult = 2.5
)

Arguments

data

data.frame. Database with the response (0,1) and predictors values.

response

character. Column name with species absence-presence data (0,1)

predictors

character. Vector with the column names of quantitative predictor variables (i.e. continuous variables). This function can only construct models with continuous variables, and does not allow categorical variables Usage predictors = c("aet", "cwd", "tmin").

partition

character. Column name with training and validation partition groups.

thr

character. Threshold used to get binary suitability values (i.e. 0,1). It is useful for threshold-dependent performance metrics. It is possible to use more than one threshold type. It is necessary to provide a vector for this argument. The following threshold criteria are available:

equal_sens_spec: Threshold at which the sensitivity and specificity are equal.
max_sens_spec: Threshold at which the sum of the sensitivity and specificity is the highest (aka threshold that maximizes the TSS).
max_jaccard: The threshold at which Jaccard is the highest.
max_sorensen: The threshold at which Sorensen is highest.
max_fpb: The threshold at which FPB (F-measure on presence-background data) is highest.
sensitivity: Threshold based on a specified sensitivity value. Usage thr = c('sensitivity', sens='0.6') or thr = c('sensitivity'). 'sens' refers to sensitivity value. If no sensitivity value is specified, the default is 0.9

If the user wants to include more than one threshold type, it is necessary concatenate threshold types, e.g., thr=c('max_sens_spec', 'max_jaccard'), or thr=c('max_sens_spec', 'sensitivity', sens='0.8'), or thr=c('max_sens_spec', 'sensitivity'). Function will use all thresholds if no threshold is specified.

background

data.frame. Database with response column only with 0 and predictors variables. All column names must be consistent with data. Default NULL

clamp

logical. It is set with TRUE, predictors and features are restricted to the range seen during model training.

classes

character. A single feature of any combinations of them. Features are symbolized by letters: l (linear), q (quadratic), h (hinge), p (product), and t (threshold). Usage classes = "lpq". Default "default" (see details).

pred_type

character. Type of response required available "link", "exponential", "cloglog" and "logistic". Default "cloglog"

regmult

numeric. A constant to adjust regularization. Because ESM are used for modeling species with few records default value is 2.5

Value

A list object with:

esm_model: A list with "maxnet" class object from maxnet package for each bivariate model. This object can be used for predicting ensembles of small models with sdm_predict function.
predictors: A tibble with variables use for modeling.
performance: Performance metrics (see sdm_eval). Those threshold dependent metric are calculated based on the threshold specified in the argument.
performance_part: Performance metric for each replica and partition (see sdm_eval).

Details

This method consists of creating bivariate models with all the pair-wise combinations of predictors and perform an ensemble based on the average of suitability weighted by Somers' D metric (D = 2 x (AUC -0.5)). ESM is recommended for modeling species with few occurrences. This function does not allow categorical variables because the use of these types of variables could be problematic when using with few occurrences. For further detail see Breiner et al. (2015, 2018). This function use a default regularization multiplier equal to 2.5 (see Breiner et al., 2018)

When the argument “classes” is set as default MaxEnt will use different features combination depending of the number of presences (np) with the follow rule: if np < 10 classes = "l", if np between 10 and 15 classes = "lq", if np between 15 and 80 classes = "lqh", and if np >= 80 classes = "lqph"

When presence-absence (or presence-pseudo-absence) data are used in data argument in addition to background points, the function will fit models with presences and background points and validate with presences and absences. This procedure makes maxent comparable to other presences-absences models (e.g., random forest, support vector machine). If only presences and background points data are used, function will fit and validate model with presences and background data. If only presence-absences are used in data argument and without background, function will fit model with the specified data (not recommended).

References

Breiner, F. T., Guisan, A., Bergamini, A., & Nobis, M. P. (2015). Overcoming limitations of modelling rare species by using ensembles of small models. Methods in Ecology and Evolution, 6(10), 1210-218. https://doi.org/10.1111/2041-210X.12403
Breiner, F. T., Nobis, M. P., Bergamini, A., & Guisan, A. (2018). Optimizing ensembles of small models for predicting the distribution of species with few occurrences. Methods in Ecology and Evolution, 9(4), 802-808. https://doi.org/10.1111/2041-210X.12957

Examples

if (FALSE) { # \dontrun{
data("abies")
data("backg")
require(dplyr)

# Using k-fold partition method
set.seed(10)
abies2 <- abies %>%
  na.omit() %>%
  group_by(pr_ab) %>%
  dplyr::slice_sample(n = 10) %>%
  group_by()

abies2 <- part_random(
  data = abies2,
  pr_ab = "pr_ab",
  method = c(method = "rep_kfold", folds = 5, replicates = 5)
)
abies2

set.seed(10)
backg2 <- backg %>%
  na.omit() %>%
  group_by(pr_ab) %>%
  dplyr::slice_sample(n = 100) %>%
  group_by()

backg2 <- part_random(
  data = backg2,
  pr_ab = "pr_ab",
  method = c(method = "rep_kfold", folds = 5, replicates = 5)
)
backg2

# Without threshold specification and with kfold
esm_max_t1 <- esm_max(
  data = abies2,
  response = "pr_ab",
  predictors = c("aet", "cwd", "tmin", "ppt_djf", "ppt_jja", "pH", "awc", "depth"),
  partition = ".part",
  thr = NULL,
  background = backg2,
  clamp = TRUE,
  classes = "default",
  pred_type = "cloglog",
  regmult = 1
)

esm_max_t1$esm_model # bivariate model
esm_max_t1$predictors
esm_max_t1$performance
esm_max_t1$performance_part
} # }