This function perform geographical filtering of species occurrences based on different approach to define the minimum nearest-neighbor distance between points.
Arguments
- data
data.frame. Data.frame or tibble object with presences (or presence-absence) records, and coordinates
- x
character. Column name with longitude data
- y
character. Column name with latitude data
- env_layer
SpatRaster. Raster variables that will be used to fit the model
- method
character. Method to perform geographical thinning. Pairs of points are filtered based on a geographical distance criteria.The following methods are available:
moran: records are filtered based on the smallest distance that reduces Moran's I to values lower than 0.1. Latlong = TRUE if occurrences are in a geographical projection. Usage method: method = c('moran').
cellsize: records are filtered based on the resolution of the environmental variables which can be aggregated to coarser resolution defined by the factor. Usage method: method = c('cellsize', factor = '2').
defined: records are filtered based on a distance value (d) provided in km. Usage method: method = c('defined', d = 300).
- prj
character. Projection string (PROJ4) for occurrences. Not necessary if the projection used is WGS84 ("+proj=longlat +datum=WGS84"). Default "+proj=longlat +datum=WGS84"
- reps
integer. Number of times to repeat the thinning process. Default 20
Details
In this function three alternatives are implemented to determine the distance threshold between pair of points: 1-"moran" determines the minimum nearest-neighbor distance that minimizes the spatial autocorrelation in occurrence data, following a Moran's semivariogram. A Principal Component Analysis with the environmental variables is performed and then the first Principal Component is used to calculate the semivariograms. Because of this, this method only allow the use of continuous variables. Sometimes, this method can (too) greatly reduce the number of presences. 2-"cellsize" filters occurrences based on the predictors' resolution. This method will calculate the distance between the first two cells of the environmental variable and use this distance as minimum nearest-neighbor distance to filter occurrences. The resolution of the raster is aggregated based on the values used in "factor". Thus, the distance used for filtering can be adjusted to represent a larger grid size. 3-"determined" this method uses any minimum nearest-neighbor distance specified in km.
For the third method the "thin" function from spThin package is used (Aiello-Lammens et al., 2015) with the following argument settings reps = 20, write.files = FALSE, locs.thinned.list.return = TRUE, and write.log.file = FALSE.
References
Aiello-Lammens, M. E., Boria, R. A., Radosavljevic, A., Vilela, B., & Anderson, R. P. (2015). spThin: An R package for spatial thinning of species occurrence records for use in ecological niche models. Ecography, 38(5), 541-545. https://doi.org/10.1111/ecog.01132
Examples
if (FALSE) {
require(terra)
require(dplyr)
# Environmental variables
somevar <- system.file("external/somevar.tif", package = "flexsdm")
somevar <- terra::rast(somevar)
plot(somevar)
# Species occurrences
data("spp")
spp
spp1 <- spp %>% dplyr::filter(species == "sp1", pr_ab == 1)
somevar[[1]] %>% plot()
points(spp1 %>% select(x, y))
# Using Moran method
filtered_1 <- occfilt_geo(
data = spp1,
x = "x",
y = "y",
env_layer = somevar,
method = c("moran"),
prj = crs(somevar)
)
somevar[[1]] %>% plot(col = gray.colors(10))
points(spp1 %>% select(x, y)) # raw data
points(filtered_1 %>% select(x, y), pch = 19, col = "yellow") # filtered data
# Using cellsize method
filtered_2 <- occfilt_geo(
data = spp1,
x = "x",
y = "y",
env_layer = somevar,
method = c("cellsize", factor = "3"),
prj = crs(somevar)
)
somevar[[1]] %>% plot(col = gray.colors(10))
points(spp1 %>% select(x, y)) # raw data
points(filtered_2 %>% select(x, y), pch = 19, col = "yellow") # filtered data
# Using defined method
filtered_3 <- occfilt_geo(
data = spp1,
x = "x",
y = "y",
env_layer = somevar,
method = c("defined", d = "30"),
prj = crs(somevar)
)
somevar[[1]] %>% plot(col = gray.colors(10))
points(spp1 %>% select(x, y)) # raw data
points(filtered_3 %>% select(x, y), pch = 19, col = "yellow") # filtered data
}