Skip to contents

This function perform geographical filtering of species occurrences based on different approach to define the minimum nearest-neighbor distance between points.

Usage

occfilt_geo(
  data,
  x,
  y,
  env_layer,
  method,
  prj = "+proj=longlat +datum=WGS84",
  reps = 20
)

Arguments

data

data.frame. Data.frame or tibble object with presences (or presence-absence) records, and coordinates

x

character. Column name with longitude data

y

character. Column name with latitude data

env_layer

SpatRaster. Raster variables that will be used to fit the model

method

character. Method to perform geographical thinning. Pairs of points are filtered based on a geographical distance criteria.The following methods are available:

  • moran: records are filtered based on the smallest distance that reduces Moran's I to values lower than 0.1. Latlong = TRUE if occurrences are in a geographical projection. Usage method: method = c('moran').

  • cellsize: records are filtered based on the resolution of the environmental variables which can be aggregated to coarser resolution defined by the factor. Usage method: method = c('cellsize', factor = '2').

  • defined: records are filtered based on a distance value (d) provided in km. Usage method: method = c('defined', d = 300).

prj

character. Projection string (PROJ4) for occurrences. Not necessary if the projection used is WGS84 ("+proj=longlat +datum=WGS84"). Default "+proj=longlat +datum=WGS84"

reps

integer. Number of times to repeat the thinning process. Default 20

Value

A tibble object with data filtered geographically

Details

In this function three alternatives are implemented to determine the distance threshold between pair of points: 1-"moran" determines the minimum nearest-neighbor distance that minimizes the spatial autocorrelation in occurrence data, following a Moran's semivariogram. A Principal Component Analysis with the environmental variables is performed and then the first Principal Component is used to calculate the semivariograms. Because of this, this method only allow the use of continuous variables. Sometimes, this method can (too) greatly reduce the number of presences. 2-"cellsize" filters occurrences based on the predictors' resolution. This method will calculate the distance between the first two cells of the environmental variable and use this distance as minimum nearest-neighbor distance to filter occurrences. The resolution of the raster is aggregated based on the values used in "factor". Thus, the distance used for filtering can be adjusted to represent a larger grid size. 3-"determined" this method uses any minimum nearest-neighbor distance specified in km.

For the third method the "thin" function from spThin package is used (Aiello-Lammens et al., 2015) with the following argument settings reps = 20, write.files = FALSE, locs.thinned.list.return = TRUE, and write.log.file = FALSE.

References

  • Aiello-Lammens, M. E., Boria, R. A., Radosavljevic, A., Vilela, B., & Anderson, R. P. (2015). spThin: An R package for spatial thinning of species occurrence records for use in ecological niche models. Ecography, 38(5), 541-545. https://doi.org/10.1111/ecog.01132

See also

Examples

if (FALSE) {
require(terra)
require(dplyr)

# Environmental variables
somevar <- system.file("external/somevar.tif", package = "flexsdm")
somevar <- terra::rast(somevar)

plot(somevar)

# Species occurrences
data("spp")
spp
spp1 <- spp %>% dplyr::filter(species == "sp1", pr_ab == 1)

somevar[[1]] %>% plot()
points(spp1 %>% select(x, y))

# Using Moran method
filtered_1 <- occfilt_geo(
  data = spp1,
  x = "x",
  y = "y",
  env_layer = somevar,
  method = c("moran"),
  prj = crs(somevar)
)

somevar[[1]] %>% plot(col = gray.colors(10))
points(spp1 %>% select(x, y)) # raw data
points(filtered_1 %>% select(x, y), pch = 19, col = "yellow") # filtered data

# Using cellsize method
filtered_2 <- occfilt_geo(
  data = spp1,
  x = "x",
  y = "y",
  env_layer = somevar,
  method = c("cellsize", factor = "3"),
  prj = crs(somevar)
)

somevar[[1]] %>% plot(col = gray.colors(10))
points(spp1 %>% select(x, y)) # raw data
points(filtered_2 %>% select(x, y), pch = 19, col = "yellow") # filtered data


# Using defined method
filtered_3 <- occfilt_geo(
  data = spp1,
  x = "x",
  y = "y",
  env_layer = somevar,
  method = c("defined", d = "30"),
  prj = crs(somevar)
)

somevar[[1]] %>% plot(col = gray.colors(10))
points(spp1 %>% select(x, y)) # raw data
points(filtered_3 %>% select(x, y), pch = 19, col = "yellow") # filtered data
}