The function balances a given database based on the specified ratio of absence to presence. It randomly removes excess of absence in the database to achieve the specified ratio. This function interprets as absence all those data with abundance equal to zero.
Arguments
- data
data.frame or tibble. Database that contains a columns with abundance.
- response
string. The name of the column in `data` representing the response variable. Note that absence are interpreted all those data with abundance equal to zero. Usage response = "ind_ha"
- absence_ratio
numeric. The desired ratio of presence to absence in the response column. E.g., if set to 1 the function will remove absence until have the same number of presence. If set 1.5, the function will remove absence until have 1.5 times the number of presence. Usage absence_ratio = 0.5
Value
Returns a balanced data.frame or tibble with absence-presence ratio in the response column equal to absence_ratio
Examples
if (FALSE) {
require(dplyr)
data("sppabund")
some_sp <- sppabund %>%
dplyr::filter(species == "Species three") %>%
dplyr::select(species, ind_ha, x, y)
table(some_sp$ind_ha > 0)
# Note that the dataset is almost balanced
# However, as an example, let's assume that we want to reduce
# the number of absences half of the number of presences
some_sp_2 <- balance_dataset(
data = some_sp,
response = "ind_ha",
absence_ratio = 0.5
)
table(some_sp$ind_ha > 0)
table(some_sp_2$ind_ha > 0)
}