Fit and validate Random Forest models with exploration of hyper-parameters that optimize performance
Source:R/tune_raf.R
tune_raf.Rd
Fit and validate Random Forest models with exploration of hyper-parameters that optimize performance
Usage
tune_raf(
data,
response,
predictors,
predictors_f = NULL,
fit_formula = NULL,
partition,
grid = NULL,
thr = NULL,
metric = "TSS",
n_cores = 1
)
Arguments
- data
data.frame. Database with response (0,1) and predictors values.
- response
character. Column name with species absence-presence data (0,1).
- predictors
character. Vector with the column names of quantitative predictor variables (i.e. continuous variables). Usage predictors = c("aet", "cwd", "tmin")
- predictors_f
character. Vector with the column names of qualitative predictor variables (i.e. ordinal or nominal variables type). Usage predictors_f = c("landform")
- fit_formula
formula. A formula object with response and predictor variables (e.g. formula(pr_ab ~ aet + ppt_jja + pH + awc + depth + landform)). Note that the variables used here must be consistent with those used in response, predictors, and predictors_f arguments. Default NULL
- partition
character. Column name with training and validation partition groups.
- grid
data.frame. A data frame object with algorithm hyper-parameters values to be tested. It is recommended to generate this data.frame with the grid() function. Hyper-parameter needed for tuning is 'mtry' and 'ntree'. The maximum mtry cannot exceed the total number of predictors.
- thr
character. Threshold used to get binary suitability values (i.e. 0,1), needed for threshold-dependent performance metrics. It is possible to use more than one threshold type. It is necessary to provide a vector for this argument. The following threshold types are available:
lpt: The highest threshold at which there is no omission.
equal_sens_spec: Threshold at which the sensitivity and specificity are equal.
max_sens_spec: Threshold at which the sum of the sensitivity and specificity is the highest (aka threshold that maximizes the TSS).
max_jaccard: The threshold at which the Jaccard index is the highest.
max_sorensen: The threshold at which the Sorensen index is highest.
max_fpb: The threshold at which FPB is highest.
sensitivity: Threshold based on a specified sensitivity value. Usage thr = c('sensitivity', sens='0.6') or thr = c('sensitivity'). 'sens' refers to sensitivity value. If it is not specified a sensitivity values, function will use by default 0.9
If using more than one threshold type concatenate them, e.g., thr=c('lpt', 'max_sens_spec', 'max_jaccard'), or thr=c('lpt', 'max_sens_spec', 'sensitivity', sens='0.8'), or thr=c('lpt', 'max_sens_spec', 'sensitivity'). Function will use all thresholds if no threshold is specified.
- metric
character. Performance metric used for selecting the best combination of hyper -parameter values. One of the following metrics can be used: SORENSEN, JACCARD, FPB, TSS, KAPPA, AUC, and BOYCE. TSS is used as default.
- n_cores
numeric. Number of cores use for parallelization. Default 1
Value
A list object with:
model: A "randomForest" class object from randomForest package. This object can be used for predicting.
predictors: A tibble with quantitative (c column names) and qualitative (f column names) variables use for modeling.
performance: Hyper-parameters values and performance metric (see
sdm_eval
) for the best hyper-parameters combination.hyper_performance: Performance metric (see
sdm_eval
) for each combination of the hyper-parameters.data_ens: Predicted suitability for each test partition based on the best model. This database is used in
fit_ensemble
Examples
if (FALSE) { # \dontrun{
data(abies)
abies
# Partition the data with the k-fold method
abies2 <- part_random(
data = abies,
pr_ab = "pr_ab",
method = c(method = "kfold", folds = 5)
)
tune_grid <- expand.grid(
mtry = seq(1, 7, 1),
ntree = c(400, 600, 800)
)
tune_grid
rf_t <-
tune_raf(
data = abies2,
response = "pr_ab",
predictors = c(
"aet", "cwd", "tmin", "ppt_djf",
"ppt_jja", "pH", "awc", "depth"
),
predictors_f = c("landform"),
partition = ".part",
grid = tune_grid,
thr = "max_sens_spec",
metric = "TSS",
n_cores = 1
)
# Outputs
rf_t$model
rf_t$predictors
rf_t$performance
rf_t$hyper_performance
rf_t$data_ens
} # }