Skip to contents

This function perform filtering on species occurrences based on their environmental conditions.

Usage

occfilt_env(data, x, y, id, env_layer, nbins)

Arguments

data

data.frame. Data.frame or tibble object with presences (or presence-absence) records, and coordinates

x

character. Column name with spatial x coordinates

y

character. Column name with spatial y coordinates

id

character. Column names with rows id. It is important that each row has its own unique code.

env_layer

SpatRaster. Rasters with environmental conditions

nbins

integer. A number of classes used to split each environmental condition

Value

A tibble object with data environmentally filtered

Details

This function uses an approach adapted from the approach proposed by Varela et al. (2014). It consists of filtering occurrences in environmental space. First, a regular multidimensional grid is created in environmental space. This multidimensional grid is determined by the environmental variables (always use continuous variables) the grid cell size is defined by the number of bins, used for dividing variable range into interval classes (Varela et al. 2014; Castellanos et al., 2019). The number of bins is set in the "nbins" argument. Then, a single occurrence is randomly selected within each cell of the multidimensional grid. Consider that there is a trade-off between the number of bins and the number of filtered records because as the number of bins decreases, the cell size of the grids increases, and the number of filtered records decreases (Castellanos et al., 2019). occfilt_env works for any number of dimensions (variables) and with the original variables without performing a PCA beforehand.

The greater the number of predictor variables (i.e., the number of dimensions of the multidimensional environmental grid) and the greater the number of bins, the greater the time processing and the computer memory used. Therefore, it is recommended to use a small number of bins between 2-5 if more than ten variables are used.

Environmental filters are sensitive to the number of bins. A procedure for selecting the number of bins was used by Velazco et al. (2020). This selection consists of testing different numbers of bins, calculating the average spatial autocorrelation among variables (based on the Moran’s I index), and then selecting the lowest average spatial autocorrelation with the highest number occurrences. Note that while the greater the number of bins, the greater records retained

References

  • Castellanos, A. A., Huntley, J. W., Voelker, G., & Lawing, A. M. (2019). Environmental filtering improves ecological niche models across multiple scales. Methods in Ecology and Evolution, 10(4), 481-492. https://doi.org/10.1111/2041-210X.13142

  • Varela, S., Anderson, R. P., Garcia-Valdes, R., & Fernandez-Gonzalez, F. (2014). Environmental filters reduce the effects of sampling bias and improve predictions of ecological niche models. Ecography, 37, 1084-1091. https://doi.org/10.1111/j.1600-0587.2013.00441.x

  • Velazco, S. J. E., Svenning, J-C., Ribeiro, B. R., & Laureto, L. M. O. (2020). On opportunities and threats to conserve the phylogenetic diversity of Neotropical palms. Diversity and Distributions, 27, 512–523. https://doi.org/10.1111/ddi.13215

See also

Examples

if (FALSE) {
require(terra)
require(dplyr)

# Environmental variables
somevar <- system.file("external/somevar.tif", package = "flexsdm")
somevar <- terra::rast(somevar)

plot(somevar)

# Species occurrences
data("spp")
spp
spp1 <- spp %>% dplyr::filter(species == "sp1", pr_ab == 1)

somevar[[1]] %>% plot()
points(spp1 %>% select(x, y))

spp1$idd <- 1:nrow(spp1)


# split environmental variables into 5 bins
filtered_1 <- occfilt_env(
  data = spp1,
  x = "x",
  y = "y",
  id = "idd",
  env_layer = somevar,
  nbins = 5
)

# split into 8 bins
filtered_2 <- occfilt_env(
  data = spp1,
  x = "x",
  y = "y",
  id = "idd",
  env_layer = somevar,
  nbins = 8
)

# split into 12 bins
filtered_3 <- occfilt_env(
  data = spp1,
  x = "x",
  y = "y",
  id = "idd",
  env_layer = somevar,
  nbins = 12
)
# note that the higher the nbins parameter the more
# classes must be processed (4 variables, 30 bins = 923521 classes)

# While the greater the greater the number of bins, the greater records retained
}