Fit and validate Extreme Gradient Boosting models

Usage

fit_abund_xgb(
  data,
  response,
  predictors,
  predictors_f = NULL,
  partition,
  predict_part = FALSE,
  nrounds = 100,
  max_depth = 5,
  eta = 0.1,
  gamma = 1,
  colsample_bytree = 1,
  min_child_weight = 1,
  subsample = 0.5,
  objective = "reg:squarederror",
  verbose = TRUE
)

Arguments

data: tibble or data.frame. Database with response, predictors, and partition values
response: character. Column name with species abundance.
predictors: character. Vector with the column names of quantitative predictor variables (i.e. continuous variables). Usage predictors = c("temp", "precipt", "sand")
predictors_f: character. Vector with the column names of qualitative predictor variables (i.e. ordinal or nominal variables type). Usage predictors_f = c("landform")
partition: character. Column name with training and validation partition groups.
predict_part: logical. Save predicted abundance for testing data. Default = FALSE.
nrounds: integer. Max number of boosting iterations. Default is 100.
max_depth: integer. The maximum depth of each tree. Default 5
eta: numeric. The learning rate of the algorithm. Default 0.1
gamma: numeric. Minimum loss reduction required to make a further partition on a leaf node of the tree. Default is 1.
colsample_bytree: numeric. Subsample ratio of columns when constructing each tree. Default is 1.
min_child_weight: numeric. Minimum sum of instance weight needed in a child. Default is 1.
subsample: numeric. Subsample ratio of the training instance. Default is 0.5.
objective: character. The learning task and the corresponding learning objective. Default is "reg:squarederror", regression with squared loss.
verbose: logical. If FALSE, disables all console messages. Default TRUE.

Value

A list object with:

model: A "xgb.Booster" class object from xgboost package. This object can be used for predicting.
predictors: A tibble with quantitative (c column names) and qualitative (f column names) variables use for modeling.
performance: Averaged performance metrics (see adm_eval).
performance_part: Performance metrics for each replica and partition.
predicted_part: Observed and predicted abundance for each test partition.

Examples

if (FALSE) {
require(terra)
require(dplyr)

# Database with species abundance and x and y coordinates
data("sppabund")

# Extract data for a single species
some_sp <- sppabund %>%
  dplyr::filter(species == "Species one") %>%
  dplyr::select(-.part2, -.part3)

# Explore reponse variables
some_sp$ind_ha %>% range()
some_sp$ind_ha %>% hist()

# Here we balance number of absences
some_sp <-
  balance_dataset(some_sp, response = "ind_ha", absence_ratio = 0.2)

# Fit a XGB model
mxgb <- fit_abund_xgb(
  data = some_sp,
  response = "ind_ha",
  predictors = c("bio12", "elevation", "sand"),
  predictors_f = NULL,
  partition = ".part",
  nrounds = 200,
  max_depth = 5,
  eta = 0.1,
  gamma = 1,
  colsample_bytree = 0.7,
  min_child_weight = 2,
  subsample = 0.3,
  objective = "reg:squarederror",
  predict_part = TRUE
)

mxgb
}