Stratified over-sampling bagging method for random forests on imbalanced data

dc.contributor.authorZhao, Heen
dc.contributor.authorChen, Xiaojunen
dc.contributor.authorNguyen, Tungen
dc.contributor.authorHuang, Joshua Zhexueen
dc.contributor.authorWilliams, Grahamen
dc.contributor.authorChen, Huien
dc.date.accessioned2026-01-01T07:41:25Z
dc.date.available2026-01-01T07:41:25Z
dc.date.issued2016en
dc.description.abstractImbalanced data presents a big challenge to random forests (RF). Over-sampling is a commonly used sampling method for imbalanced data, which increases the number of instances of minority class to balance the class distribution. However, such method often produces sample data sets that are highly correlated if we only sample more minority class instances, thus reducing the generalizability of RF. To solve this problem, we propose a stratified over-sampling (SOB) method to generate both balanced and diverse training data sets for RF. We first cluster the training data set multiple times to produce multiple clustering results. The small individual clusters are grouped according to their entropies. Then we sample a set of training data sets from the groups of clusters using stratified sampling method. Finally, these training data sets are used to train RF. The data sets sampled with SOB are guaranteed to be balanced and diverse, which improves the performance of RF on imbalanced data. We have conducted a series of experiments, and the experimental results have shown that the proposed method is more effective than some existing sampling methods.en
dc.description.sponsorshipThis work was supported by Guangdong Fund under Grant No. 2013B091300019, NSFC under Grant No. 61305059 and No. 61473194, and Natural Science Foundation of SZU (Grant No. 201432).en
dc.description.statusPeer-revieweden
dc.format.extent10en
dc.identifier.isbn9783319318622en
dc.identifier.issn0302-9743en
dc.identifier.otherORCID:/0000-0001-7041-4127/work/162449856en
dc.identifier.scopus84962429244en
dc.identifier.urihttps://hdl.handle.net/1885/733798761
dc.language.isoenen
dc.publisherSpringer Verlagen
dc.relation.ispartofIntelligence and Security Informatics - 11th Pacific Asia Workshop, PAISI 2016, Proceedingsen
dc.relation.ispartofseries11th Pacific Asia Workshop on Intelligence and Security Informatics, PAISI 2016en
dc.relation.ispartofseriesLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)en
dc.rightsPublisher Copyright: © Springer International Publishing Switzerland 2016.en
dc.subjectClassificationen
dc.subjectImbalanced dataen
dc.subjectRandom forestsen
dc.subjectStratified samplingen
dc.titleStratified over-sampling bagging method for random forests on imbalanced dataen
dc.typeConference paperen
dspace.entity.typePublicationen
local.bibliographicCitation.lastpage72en
local.bibliographicCitation.startpage63en
local.contributor.affiliationZhao, He; Shenzhen Institute of Advanced Technologyen
local.contributor.affiliationChen, Xiaojun; Shenzhen Universityen
local.contributor.affiliationNguyen, Tung; Thuyloi Universityen
local.contributor.affiliationHuang, Joshua Zhexue; Shenzhen Universityen
local.contributor.affiliationWilliams, Graham; School of Computing, ANU College of Systems and Society, The Australian National Universityen
local.contributor.affiliationChen, Hui; Shenzhen Institute of Advanced Technologyen
local.identifier.ariespublicationU3488905xPUB16373en
local.identifier.doi10.1007/978-3-319-31863-9_5en
local.identifier.essn1611-3349en
local.identifier.purec3b7add8-19e4-43b8-b05f-3a5db243f426en
local.identifier.urlhttps://www.scopus.com/pages/publications/84962429244en
local.type.statusPublisheden

Downloads