Skip to contents

Fit and validate Generalized Linear Models

Usage

fit_glm(
  data,
  response,
  predictors,
  predictors_f = NULL,
  select_pred = FALSE,
  partition,
  thr = NULL,
  fit_formula = NULL,
  poly = 2,
  inter_order = 0
)

Arguments

data

data.frame. Database with response (0,1) and predictors values.

response

character. Column name with species absence-presence data (0,1).

predictors

character. Vector with the column names of quantitative predictor variables (i.e. continuous variables). Usage predictors = c("aet", "cwd", "tmin")

predictors_f

character. Vector with the column names of qualitative predictor variables (i.e. ordinal or nominal variables type). Usage predictors_f = c("landform")

select_pred

logical. Perform predictor selection. If TRUE predictors will be selected based on backward step wise approach. Default FALSE.

partition

character. Column name with training and validation partition groups.

thr

character. Threshold used to get binary suitability values (i.e. 0,1), needed for threshold-dependent performance metrics. More than one threshold type can be used. It is necessary to provide a vector for this argument. The following threshold criteria are available:

  • lpt: The highest threshold at which there is no omission.

  • equal_sens_spec: Threshold at which the sensitivity and specificity are equal.

  • max_sens_spec: Threshold at which the sum of the sensitivity and specificity is the highest (aka threshold that maximizes the TSS).

  • max_jaccard: The threshold at which the Jaccard index is the highest.

  • max_sorensen: The threshold at which the Sorensen index is highest.

  • max_fpb: The threshold at which FPB (F-measure on presence-background data) is highest.

  • sensitivity: Threshold based on a specified sensitivity value. Usage thr = c('sensitivity', sens='0.6') or thr = c('sensitivity'). 'sens' refers to sensitivity value. If it is not specified a sensitivity values, function will use by default 0.9

If more than one threshold type is used they must be concatenated, e.g., thr=c('lpt', 'max_sens_spec', 'max_jaccard'), or thr=c('lpt', 'max_sens_spec', 'sensitivity', sens='0.8'), or thr=c('lpt', 'max_sens_spec', 'sensitivity'). Function will use all thresholds if no threshold is specified.

fit_formula

formula. A formula object with response and predictor variables (e.g. formula(pr_ab ~ aet + ppt_jja + pH + awc + depth + landform)). Note that the variables used here must be consistent with those used in response, predictors, and predictors_f arguments

poly

interger >= 2. If used with values >= 2 model will use polynomials for those continuous variables (i.e. used in predictors argument). Default is 0.

inter_order

interger >= 0. The interaction order between explanatory variables. Default is 0.

Value

A list object with:

  • model: A "glm" class object from stats package. This object can be used for predicting.

  • predictors: A tibble with quantitative (c column names) and qualitative (f column names) variables use for modeling.

  • performance: Performance metrics (see sdm_eval). Threshold dependent metric are calculated based on the threshold specified in thr argument .

  • data_ens: Predicted suitability for each test partition. This database is used in fit_ensemble

Examples

if (FALSE) {
data("abies")
abies

# Using k-fold partition method
abies2 <- part_random(
  data = abies,
  pr_ab = "pr_ab",
  method = c(method = "kfold", folds = 5)
)
abies2

glm_t1 <- fit_glm(
  data = abies2,
  response = "pr_ab",
  predictors = c("aet", "ppt_jja", "pH", "awc", "depth"),
  predictors_f = c("landform"),
  select_pred = FALSE,
  partition = ".part",
  thr = c("max_sens_spec", "equal_sens_spec", "max_sorensen"),
  poly = 0,
  inter_order = 0
)
glm_t1$model
glm_t1$predictors
glm_t1$performance
glm_t1$data_ens

# Using second order polynomial terms and first-order interaction terms
glm_t2 <- fit_glm(
  data = abies2,
  response = "pr_ab",
  predictors = c("aet", "ppt_jja", "pH", "awc", "depth"),
  predictors_f = c("landform"),
  select_pred = FALSE,
  partition = ".part",
  thr = c("max_sens_spec", "equal_sens_spec", "max_sorensen"),
  poly = 2,
  inter_order = 1
)

# Using repeated k-fold partition method
abies2 <- part_random(
  data = abies,
  pr_ab = "pr_ab",
  method = c(method = "rep_kfold", folds = 3, replicates = 5)
)
abies2

# Using third order polynomial terms and second-order interaction terms
glm_t3 <- fit_glm(
  data = abies2,
  response = "pr_ab",
  predictors = c("ppt_jja", "pH", "awc"),
  predictors_f = c("landform"),
  select_pred = FALSE,
  partition = ".part",
  thr = c("max_sens_spec", "equal_sens_spec", "max_sorensen"),
  poly = 3,
  inter_order = 2
)
}