Skip to contents

This function explores different numbers of spatial bands and returns the most suitable value for a given presence or presence-absence database. The selection of the best number of bands is performed automatically considering spatial autocorrelation, environmental similarity, and the number of presence and absence records in each partition.

Usage

part_sband(
  env_layer,
  data,
  x,
  y,
  pr_ab,
  type = "lon",
  n_part = 2,
  min_bands = 2,
  max_bands = 20,
  min_occ = 10,
  prop = 0.5
)

Arguments

env_layer

SpatRaster. Raster with environmental variable. Used to evaluate spatial autocorrelation and environmental similarity between training and testing partitions. Because this function calculate dissimilarity based on Euclidean distances, it can only be used with continuous environmental variables

data

data.frame. Data.frame or tibble object with presences (or presence-absence, or presence-pseudo-absence) records, and coordinates

x

character. Column name with spatial x coordinates

y

character. Column name with spatial y coordinates

pr_ab

character. Column with presences, presence-absence, or -pseudo-absence. Presences must be represented by 1 and absences by 0

type

character. Specify bands across different degrees of longitude 'lon' or latitude 'lat'. Default is 'lon'.

n_part

integer. Number of partition. Default 2, values other than 2 has not yet been implemented.

min_bands

integer. Minimum number of spatial bands to be tested, default 2.

max_bands

integer. Maximum number of spatial bands to be tested, default 20.

min_occ

numeric. Minimum number of presences or absences in a partition fold. The min_occ value should be base on the number of predictors in order to avoid over-fitting or error when fitting models for a given fold. Default 10.

prop

numeric. Proportion of points used for testing autocorrelation between groups (values > 0 and <=1). The smaller this number is, the faster the function will work. Default 0.5

Value

A list with:

  • part: A tibble object with information used in 'data' arguments and a additional column .part with partition group.

  • best_part_info: A tibble with information about the best partition. It contains the number of the best partition (n_grid), number of bands (n_bands), standard deviation of presences (sd_p), standard deviation of absences (sd_a), Moran's I spatial autocorrelation (spa_auto), and environmental similarity based on Euclidean distance (env_sim).

  • grid: A SpatRaster object with bands

Details

The part_sbands function allows testing different numbers of partitions using a range of latitudinal or longitudinal bands. This function explores a range of numbers of bands for a given number of partitions and automatically selects the best number of bands for a given presence, presence-absences, or presence-pseudo-absences dataset. Selection of number of bands is based on an optimization procedure that explores partitions in three dimensions determined by spatial autocorrelation (measured by Moran's I), environmental similarity (Euclidean distance), and difference in the amount of data among partition groups (Standard Deviation - SD; Velazco et al., 2019). This procedure is iterative; it will first select those partitions with autocorrelation values less than the lowest quartile of Morans I, then those with environmental similarity values greater than the third quartile of the Euclidean distances, then those with a difference in the amount of data less than the lowest quartile of SD. This selection is repeated until only one partition is retained (Velazco et al., 2019). The main benefits of this partition selection are that it i) is not subjective, ii) balances the environmental similarity and special autocorrelation between partitions groups, and iii) controls the selection of partitions with very little data that may be problematic for model fitting ("min_occ" argument).

Partitions that are geographically structured tend to evaluate model transferability more directly than conventional ones (e.g., those performed by part_random) (Roberts et al., 2017; Santini et al., 2021), being relevant for models that are to be used for projections in other regions outside the calibration area or for other time periods. Band partitions can be an option for those species where no best partition is found with part_sblock or for species that are distributed linearly (e.g., species that inhabit coastlines).

This function can interact with get_block, sample_background, and sample_pseudoabs for sampling background points or pseudo-absences within spatial partition broups

References

  • Roberts, D. R., Bahn, V., Ciuti, S., Boyce, M. S., Elith, J., Guillera-Arroita, G., Hauenstein, S., Lahoz-Monfort, J. J., Schroder, B., Thuiller, W., Warton, D. I., Wintle, B. A., Hartig, F., & Dormann, C. F. (2017). Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography, 40, 913-929. https://doi.org/10.1111/ecog.02881

  • Santini, L., Benitez-Lopez, A., Maiorano, L., Cengic, M., & Huijbregts, M. A. J. (2021). Assessing the reliability of species distribution projections in climate change research. Diversity and Distributions, ddi.13252. https://doi.org/10.1111/ddi.13252

  • Velazco, S. J. E., Villalobos, F., Galvao, F., & De Marco Junior, P. (2019). A dark scenario for Cerrado plant species: Effects of future climate, land use and protected areas ineffectiveness. Diversity and Distributions, 25(4), 660-673. https://doi.org/10.1111/ddi.12886

Examples

if (FALSE) {
require(terra)
require(dplyr)

# Load datasets
data(spp)
f <- system.file("external/somevar.tif", package = "flexsdm")
somevar <- terra::rast(f)

# Example of two longitudinal partitions with presences and absences
single_spp <- spp %>% dplyr::filter(species == "sp1")
part_1 <- part_sband(
  env_layer = somevar,
  data = single_spp,
  x = "x",
  y = "y",
  pr_ab = "pr_ab",
  type = "lon",
  min_bands = 2,
  max_bands = 20,
  n_part = 2,
  min_occ = 10,
  prop = 0.5
)

part_1$part # database with partition fold (.part)
part_1$part %>%
  group_by(pr_ab, .part) %>%
  count() # number of presences and absences in each fold
part_1$best_part_info # information of the best partition
part_1$grid # raster with folds

# Explore grid object and presences and absences points
plot(part_1$grid, col = gray.colors(20))
points(part_1$part[c("x", "y")],
  col = rainbow(8)[part_1$part$.part],
  cex = 0.9,
  pch = c(1, 19)[part_1$part$pr_ab + 1]
)


# Example of four latitudinal partition and only presences
single_spp <- spp %>% dplyr::filter(species == "sp1", pr_ab == 1)
part_2 <- part_sband(
  env_layer = somevar,
  data = single_spp,
  x = "x",
  y = "y",
  pr_ab = "pr_ab",
  type = "lat",
  min_bands = 8,
  max_bands = 40,
  n_part = 8,
  min_occ = 10,
  prop = 0.5
)

part_2$part
part_2$best_part_info
part_2$grid

# Explore Grid object and presences points
plot(part_2$grid, col = gray.colors(20))
points(part_2$part[c("x", "y")],
  col = rainbow(8)[part_2$part$.part],
  cex = 0.5,
  pch = 19
)
}