Using reproducible research pipelines to help disentangle health effects of environmental changes from social factors
Abstract
The scientific questions motivating this thesis relate to the
health effects of environmental changes including droughts,
bushfires, woodsmoke, duststorms and heatwaves. Such questions
require us to attempt to disentangle health effects of
environmental changes from social factors as all diseases have
multiple causal factors. Environmental exposures should be
explored in the context of many other variables that comprise the
biological and socioeconomic milieu. Investigators often narrow
the focus to a single environmental cause and health effect. A
simple example is bushfire smoke and its direct effects on
cardiorespiratory disease. A more complex example is the indirect
relationship between drought and suicide. Even simple questions
require consideration of numerous putative causes and
confounders. Adequately controlling for all these factors in
statistical models is difficult. Furthermore results might be
sensitive to choice of analysis procedure, or otherwise
error-prone due to the many steps.
Such difficulties have led to what some researchers assert is a
‘reproducibility crisis’ where many scientific publications
are difficult or impossible to reproduce. This, with fallacious
findings, harms scientific credibility. Reproducibility of data
analysis is defined as the ability to recompute results, given a
dataset and knowledge of the method’s steps. A key problem
impairing reproducibility is inadequate documentation of the
numerous steps and decisions required for the computations.
Reproducible research pipelines allow data and software (such as
analysis code) to be disseminated with publications, enhancing
reproducibility. However, this approach often places a
considerable burden on the analyst. This thesis identifies
effective methods for implementing reproducible research
pipelines in environmental epidemiology, aiming to reduce this
burden.
In addition to the contribution to methodology which this
constitutes, the thesis also includes a range of peer reviewed
papers (along with accompanying datasets and software packages of
code) published by the author, which also add to knowledge. Key
findings include health effects of environmental changes relevant
to debates about climate change. Reproducibility of these
findings enhances their credibility in response to the heightened
scepticism of those debates. Important insights included the
finding that the risk of suicide in New South Wales increases in
rural men during drought but decreases during droughts for women.
Another striking finding was that while bushfire smoke and
duststorms each increased cardiorespiratory mortality risk in
Sydney, they appear to do so in different ways, with dust having
a much higher risk estimate than biomass smoke.
In cases such as this where findings are novel, unexpected or
contradict accepted opinion, the scientific method stresses the
need for scepticism and critical review. Reproducible research
pipelines strengthen our ability to conduct such review beyond
what was available in the traditional research model. Not only
does the use of pipelines make methodological choices and
assumptions more transparent; doing so also safeguards against
data misuse by making errors easier to find. Encoding analysis
steps in a computer ‘scripting’ language and distributing the
data and code with publications aids readers to assess (and
challenge) each choice of data or methods. This will help
minimise mistakes in the execution or interpretation of research.
Description
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description