Imputations for High Missing Rate Data in Covariates Via Semi-supervised Learning Approach

dc.contributor.authorLan, Wei
dc.contributor.authorChen, Xuerong
dc.contributor.authorZou, Tao
dc.contributor.authorTsai, Chih-Ling
dc.date.accessioned2024-01-14T21:27:42Z
dc.date.issued2021
dc.date.updated2022-09-25T08:16:57Z
dc.description.abstractAdvancements in data collection techniques and the heterogeneity of data resources can yield high percentages of missing observations on variables, such as block-wise missing data. Under missing-data scenarios, traditional methods such as the simple average, k-nearest neighbor, multiple, and regression imputations may lead to results that are unstable or unable be computed. Motivated by the concept of semi-supervised learning, we propose a novel approach with which to fill in missing values in covariates that have high missing rates. Specifically, we consider the missing and nonmissing subjects in any covariate as the unlabeled and labeled target outputs, respectively, and treat their corresponding responses as the unlabeled and labeled inputs. This innovative setting allows us to impute a large number of missing data without imposing any model assumptions. In addition, the resulting imputation has a closed form for continuous covariates, and it can be calculated efficiently. An analogous procedure is applicable for discrete covariates. We further employ the nonparametric techniques to show the theoretical properties of imputed covariates. Simulation studies and an online consumer finance example are presented to illustrate the usefulness of the proposed method.en_AU
dc.description.sponsorshipWei Lan’s research was supported by the National Natural Science Foundation of China (NSFC,71991472, 12171395, 11931014, 71532001), the Joint Lab of Data Science and Business Intelligence at Southwestern University of Finance and Economics, and the Fundamental Research Funds for the Central Universities (JBK1806002). Xuerong Chen’s research was supported by the National Natural Science Foundation of China (NSFC,11871402,11931014) and the Fundamental Research Funds for the Central Universities (JBK1806002). Tao Zou’s research was supported by ANU College of Business and Economics Early Career Researcher Grant, the RSFAS Cross Disciplinary Grant.en_AU
dc.format.mimetypeapplication/pdfen_AU
dc.identifier.issn0735-0015en_AU
dc.identifier.urihttp://hdl.handle.net/1885/311393
dc.language.isoen_AUen_AU
dc.publisherAmerican Statistical Associationen_AU
dc.rights© 2022 The authorsen_AU
dc.sourceJournal of Business and Economic Statisticsen_AU
dc.subjectBlock-wise missingen_AU
dc.subjectCross-validationen_AU
dc.subjectHigh missing rate dataen_AU
dc.subjectInterchangeable imputationen_AU
dc.subjectSemi-supervised imputationen_AU
dc.titleImputations for High Missing Rate Data in Covariates Via Semi-supervised Learning Approachen_AU
dc.typeJournal articleen_AU
local.bibliographicCitation.issue3en_AU
local.bibliographicCitation.lastpage1290en_AU
local.bibliographicCitation.startpage1282en_AU
local.contributor.affiliationLan, Wei, Southwestern University of Finance and Economicsen_AU
local.contributor.affiliationChen, Xuerong, Southwestern University of Finance and Economics, Chengdu, China;en_AU
local.contributor.affiliationZou, Tao, College of Business and Economics, ANUen_AU
local.contributor.affiliationTsai, Chih-Ling, University of California at Davisen_AU
local.contributor.authoruidZou, Tao, u1025220en_AU
local.description.embargo2099-12-31
local.description.notesImported from ARIESen_AU
local.identifier.absfor490509 - Statistical theoryen_AU
local.identifier.absfor350202 - Financeen_AU
local.identifier.ariespublicationa383154xPUB19892en_AU
local.identifier.citationvolume40en_AU
local.identifier.doi10.1080/07350015.2021.1922120en_AU
local.identifier.scopusID2-s2.0-85107397455
local.identifier.thomsonIDWOS:000656745500001
local.publisher.urlhttps://www.tandfonline.com/en_AU
local.type.statusPublished Versionen_AU

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Imputations for High Missing Rate Data in Covariates Via Semi-supervised Learning Approach.pdf
Size:
1.25 MB
Format:
Adobe Portable Document Format
Description: