Selection bias in plots of microarray or other data that have been sampled from a high-dimensional space

dc.contributor.authorMaindonald, John
dc.contributor.authorBurden, Conrad
dc.date.accessioned2015-12-13T23:04:05Z
dc.date.issued2005
dc.date.updated2015-12-12T07:53:12Z
dc.description.abstractFor data that have many more features than observations, finding a low-dimensional representation that accurately reflects known prior groupings is non-trivial. Microarray gene expression data, used to create a "signature" or discrimination rule that distinguishes cancer tissues that are classified according to type of cancer, is an important special case. The optimal number of features is suitably determined using cross-validation, in which each of several parts of the data becomes in turn the test set, with the remaining data used for training. At each such division of "fold" of the data into a training and test set, both the selection of features and the derivation of the discriminant rule must be repeated. Use of the complete data for prior selection of features can lead to a grossly optimistic assessment of predictive accuracy and, in scatter-plot graphs that show discriminant function scores, to a spurious or exaggerated separation between groups. At each division or fold, a second versus first discriminant axis plot of test scores can be drwan. This paper presents a method for bringing there different plosts, which have different choices of features and realte to different coordinate systems, into a single plot in which the configuration of points fairly reflects the accuracy of the discriminant procedure. The methodology is applicable, in prinsiple, to use of any discriminant analysis methodology, or of ordination or multidimensional scaling, for obtaining a low dimensional graphical representation of data.
dc.identifier.issn1446-8735
dc.identifier.urihttp://hdl.handle.net/1885/85209
dc.publisherAustralian Mathematical Society
dc.sourceANZIAM Journal
dc.titleSelection bias in plots of microarray or other data that have been sampled from a high-dimensional space
dc.typeJournal article
local.bibliographicCitation.lastpageC74
local.bibliographicCitation.startpageC59
local.contributor.affiliationMaindonald, John, College of Physical and Mathematical Sciences, ANU
local.contributor.affiliationBurden, Conrad, College of Physical and Mathematical Sciences, ANU
local.contributor.authoremailu9801539@anu.edu.au
local.contributor.authoruidMaindonald, John, u9801539
local.contributor.authoruidBurden, Conrad, u1571037
local.description.embargo2037-12-31
local.description.notesImported from ARIES
local.description.refereedYes
local.identifier.absfor010202 - Biological Mathematics
local.identifier.ariespublicationMigratedxPub13482
local.identifier.citationvolume46
local.identifier.scopusID2-s2.0-70549102177
local.identifier.uidSubmittedByMigrated
local.type.statusPublished Version

Downloads

Original bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
01_Maindonald_Selection_bias_in_plots_of_2005.pdf
Size:
312.28 KB
Format:
Adobe Portable Document Format
Back to topicon-arrow-up-solid
 
APRU
IARU
 
edX
Group of Eight Member

Acknowledgement of Country

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.


Contact ANUCopyrightDisclaimerPrivacyFreedom of Information

+61 2 6125 5111 The Australian National University, Canberra

TEQSA Provider ID: PRV12002 (Australian University) CRICOS Provider Code: 00120C ABN: 52 234 063 906