Selection bias in plots of microarray or other data that have been sampled from a high-dimensional space

Maindonald, John; Burden, Conrad

Selection bias in plots of microarray or other data that have been sampled from a high-dimensional space

Date

2005

Authors

Maindonald, John

Burden, Conrad

Publisher

Australian Mathematical Society

Abstract

For data that have many more features than observations, finding a low-dimensional representation that accurately reflects known prior groupings is non-trivial. Microarray gene expression data, used to create a "signature" or discrimination rule that distinguishes cancer tissues that are classified according to type of cancer, is an important special case. The optimal number of features is suitably determined using cross-validation, in which each of several parts of the data becomes in turn the test set, with the remaining data used for training. At each such division of "fold" of the data into a training and test set, both the selection of features and the derivation of the discriminant rule must be repeated. Use of the complete data for prior selection of features can lead to a grossly optimistic assessment of predictive accuracy and, in scatter-plot graphs that show discriminant function scores, to a spurious or exaggerated separation between groups. At each division or fold, a second versus first discriminant axis plot of test scores can be drwan. This paper presents a method for bringing there different plosts, which have different choices of features and realte to different coordinate systems, into a single plot in which the configuration of points fairly reflects the accuracy of the discriminant procedure. The methodology is applicable, in prinsiple, to use of any discriminant analysis methodology, or of ordination or multidimensional scaling, for obtaining a low dimensional graphical representation of data.

URI

http://hdl.handle.net/1885/85209

Collections

ANU Research Publications

Source

ANZIAM Journal

Type

Journal article

Restricted until

2037-12-31

Downloads

File

Description

01_Maindonald_Selection_bias_in_plots_of_2005.pdf (312.28 KB)

Full item page

Cultural advice

Selection bias in plots of microarray or other data that have been sampled from a high-dimensional space

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Source

Type

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until

Downloads