Skip to contents

Fit and validate Gaussian Process models

Usage

fit_gau(
  data,
  response,
  predictors,
  predictors_f = NULL,
  background = NULL,
  partition = NULL,
  thr = NULL
)

Arguments

data

data.frame. Database with response (0,1) and predictors values.

response

character. Column name with species absence-presence data (0,1).

predictors

character. Vector with the column names of quantitative predictor variables (i.e. continuous variables). Usage predictors = c("aet", "cwd", "tmin")

predictors_f

character. Vector with the column names of qualitative predictor variables (i.e. ordinal or nominal variables type). Usage predictors_f = c("landform")

background

data.frame. Database with response column only with 0 and predictors variables. All column names must be consistent with data

partition

character. Column name with training and validation partition groups. If partition = NULL, the model will be validated with the same data used for fitting.

thr

character. Threshold used to get binary suitability values (i.e. 0,1), useful for threshold-dependent performance metrics. It is possible to use more than one threshold type. A vector must be provided for this argument. The following threshold criteria are available:

  • lpt: The highest threshold at which there is no omission.

  • equal_sens_spec: Threshold at which the sensitivity and specificity are equal.

  • max_sens_spec: Threshold at which the sum of the sensitivity and specificity is the highest (aka threshold that maximizes the TSS).

  • max_jaccard: The threshold at which the Jaccard index is the highest.

  • max_sorensen: The threshold at which the Sorensen index is the highest.

  • max_fpb: The threshold at which FPB (F-measure on presence-background data) is highest.

  • sensitivity: Threshold based on a specified sensitivity value. Usage thr = c('sensitivity', sens='0.6') or thr = c('sensitivity'). 'sens' refers to sensitivity value. If a sensitivity value is not specified, the default used is 0.9.

If more than one threshold type is used they must be concatenated, e.g., thr=c('lpt', 'max_sens_spec', 'max_jaccard'), or thr=c('lpt', 'max_sens_spec', 'sensitivity', sens='0.8'), or thr=c('lpt', 'max_sens_spec', 'sensitivity'). Function will use all threshold criteria if none is specified.

Value

A list object with:

  • model: A "graf" class object. This object can be used for predicting.

  • predictors: A tibble with quantitative (c column names) and qualitative (f column names) variables use for modeling.

  • performance: Performance metrics (see sdm_eval). Threshold dependent metrics are calculated based on the threshold criteria specified in the argument.

  • performance_part: Performance metric for each replica and partition (see sdm_eval).

  • data_ens: Predicted suitability for each test partition. This database is used in fit_ensemble

Examples

if (FALSE) { # \dontrun{
data("abies")

# Using k-fold partition method
abies2 <- part_random(
  data = abies,
  pr_ab = "pr_ab",
  method = c(method = "kfold", folds = 3)
)
abies2

bg <- abies2
bg$pr_ab <- 0


gaup_t1 <- fit_gau(
  data = abies2,
  response = "pr_ab",
  predictors = c("aet", "ppt_jja", "pH", "awc", "depth"),
  predictors_f = c("landform"),
  partition = ".part",
  background = bg,
  thr = c("max_sens_spec", "equal_sens_spec", "max_sorensen")
)

gaup_t1$model
gaup_t1$predictors
gaup_t1$performance
gaup_t1$performance_part
gaup_t1$data_ens

# Using bootstrap partition method only with presence-absence
abies2 <- part_random(
  data = abies,
  pr_ab = "pr_ab",
  method = c(method = "boot", replicates = 5, proportion = 0.7)
)
abies2

gaup_t2 <- fit_gau(
  data = abies2,
  response = "pr_ab",
  predictors = c("ppt_jja", "pH", "awc"),
  predictors_f = c("landform"),
  partition = ".part",
  thr = c(type = c("lpt", "max_sens_spec", "sensitivity"), sens = "0.8")
)
gaup_t2
} # }