Essays on Robust Model Selection and Model Averaging for Linear Models
Abstract
Model selection is central to all applied statistical work.
Selecting the variables for use in a regression model is one
important example of model selection. This thesis is a collection
of essays on robust model selection procedures and model
averaging for linear regression models.
In the first essay, we propose robust Akaike information criteria
(AIC) for MM-estimation and an adjusted robust scale based AIC
for M and MM-estimation. Our proposed model selection criteria
can maintain their robust properties in the presence of a high
proportion of outliers and the outliers in the covariates. We
compare our proposed criteria with other robust model selection
criteria discussed in previous literature. Our simulation studies
demonstrate a significant outperformance of robust AIC based on
MM-estimation in the presence of outliers in the covariates. The
real data example also shows a better performance of robust AIC
based on MM-estimation.
The second essay focuses on robust versions of the ``Least
Absolute Shrinkage and Selection Operator" (lasso). The adaptive
lasso is a method for performing simultaneous parameter
estimation and variable selection. The adaptive weights used in
its penalty term mean that the adaptive lasso achieves the oracle
property. In this essay, we propose an extension of the adaptive
lasso named the Tukey-lasso. By using Tukey's biweight criterion,
instead of squared loss, the Tukey-lasso is resistant to outliers
in both the response and covariates. Importantly, we demonstrate
that the Tukey-lasso also enjoys the oracle property. A fast
accelerated proximal gradient (APG) algorithm is proposed and
implemented for computing the Tukey-lasso. Our extensive
simulations show that the Tukey-lasso, implemented with the APG
algorithm, achieves very reliable results, including for
high-dimensional data where p>n. In the presence of outliers, the
Tukey-lasso is shown to offer substantial improvements in
performance compared to the adaptive lasso and other robust
implementations of the lasso. Real data examples further
demonstrate the utility of the Tukey-lasso.
In many statistical analyses, a single model is used for
statistical inference, ignoring the process that leads to the
model being selected. To account for this model uncertainty, many
model averaging procedures have been proposed. In the last essay,
we propose an extension of a bootstrap model averaging approach,
called bootstrap lasso averaging (BLA). BLA utilizes the lasso
for model selection. This is in contrast to other forms of
bootstrap model averaging that use AIC or Bayesian information
criteria (BIC). The use of the lasso improves the computation
speed and allows BLA to be applied even when the number of
variables p is larger than the sample size n. Extensive
simulations confirm that BLA has outstanding finite sample
performance, in terms of both variable and prediction accuracies,
compared with traditional model selection and model averaging
methods. Several real data examples further demonstrate an
improved out-of-sample predictive performance of BLA.
Description
Citation
Collections
Source
Type
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description