Among many well designed techniques for dimension reduction, the Principal Component Analysis (PCA) is one of the most popular and applicable methods. In this thesis, we address the challenges encountered when modelling and forecasting the high-dimensional data with PCA related methods in three problems.
In Chapter 2, we propose a two-style factor model to improve the forecasting of high-dimensional time series. The model pursues two types of low-dimensional features for the original...[Show more] high-dimensional time series, with one type summarizes the common time-serial trend, and the other one represents the common variations. The two types of features benefit the forecasting and the model fitting, respectively. The dynamic PCA and the static PCA are utilized in a sequential way to estimate the features in the model. We show the proposed method enjoys good statistical performance and illustrate the advantages of it with various simulating studies. By modelling and forecasting the US mortality data, we show that our method provides more accurate forecasts, especially comparing to the Lee-Carter model, which is the most popular model in mortality analysis.
In Chapter 3, we continue to study the mortality modelling. Classical mortality models usually assume the factor loadings, which capture the relationship between age variables and latent common factors, are time-invariant. This assumption, however, is too restrictive in reality, as mortality datasets typically span a long period of time. In order to reflect the changing relationship between age variables and latent common factors, we introduce a factor model with time-varying factor loadings to model the mortality data. Accordingly, two forecasting methods are proposed for which the estimated time-varying factor loadings are predicted using local linear regression and inheriting historical value (the naive method), respectively. In the empirical data analysis and the simulation studies, he proposed method can recover the time-varying factor loadings and significantly improve the mortality forecasting. As further study, we propose a method to estimate the optimal "boundary" between the short-term and long-term forecasting, which is favored by the two forecasting methods, respectively. In view of this, a hybrid forecasting method can be utilized, which consists of the local regression method before the optimal boundary and the naive method thereafter.
In Chapter 4, we propose a novel robust PCA for high-dimensional data in the presence of various kinds of heterogeneities, such as outliers, heteroscedastic noise, and heavy-tailed variables. The method is based on a characteristic-function-type of transformation. Besides the typical outliers, the proposed method has the unique advantage of dealing with heavy-tail-distributed data, whose covariances could be nonexistent (positively infinite, for instance). We show the merit and the cost of the method by studying the estimation accuracy of the reconstruction error and the impact of the transformation on a spiked covariance structure. In addition, simulation studies show the advantage of our method on data with heterogeneities. At last, we apply the method to classify mice with different genotypes in a biological study based on their protein expression data and find that our method is more accurate on identifying abnormal mice comparing to the standard PCA.
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.