The assessment of factors affecting species distribution model inference and prediction using simulated data

Santika, Truly

The assessment of factors affecting species distribution model inference and prediction using simulated data

Date

2010

Authors

Santika, Truly

Abstract

In past decades, a variety of statistical techniques have been used and developed to predict species occurrences over broad geographical areas. These models generally employ correlations between point-location data on species occurrence and environmental predictors from GIS and other mapped data. These models have wide management applications in the context of conservation biology, biogeography and climate change studies. Despite substantial progress, there are number of critical factors that have a significant impact on the performance of species distribution models, leading to uncertainties in species distribution modelling. These factors originate either from the nature of species and habitat data used to derive the distribution models, or from the modelling methodology. Numerous empirical studies using real species field data have been conducted by various authors to assess how such factors affect species distribution model performance. Comparing models with real data, however, can be problematic due to the lack of knowledge of the process controlling the true distributions of the species. Furthermore, empirical studies have yielded various, sometimes contradictory, recommendations regarding the model used and ways to minimize the impact of certain factors on model performance. From a wildlife management perspective, such conflicting recommendations are not helpful. In contrast, generating simulated data on species distributions has the advantage of providing perfect control over the causal factors of interest. Simulated data provide a way to assess the underlying response of model performance with respect to underlying assumptions, and can guide the inferences obtained from empirical studies in a systematic manner. This thesis constructs a systematic simulation data framework in order to provide an understanding of how various data and methodological factors can affect species distribution model prediction and inference. The data issues examined include the form of species occurrence and environmental dependence, prevalence (i.e. the proportion of observed sites where the species is present), and spatial autocorrelation in species occurrence data and in supporting environmental data. The methodological factors examined include the predictive performance measure, the method for setting the probability threshold used to define species occurrence in the fitted distribution model, and the success of the fitted distribution model in capturing the dominant environmental determinant for the species. The findings are used to explain relationships found by existing studies for real species distribution data. Beyond the key findings described above, the simulation approach presented in this thesis offers a promising tool for testing various aspects of species distribution modelling. Such aspects could include assessment of how constraints on species dispersal can affect model predictive performance, assessment of the sensitivity of model predictive performance to species rarity and sampling prevalence, and assessment of the effect of collinearity in predictive variables on model inference.