Machine Learning for Early Detection of Ovarian Cancer: Improving Evaluation and Explanation from Technical and Domain Perspectives
Date
2024
Authors
Huang, Weitong
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
This trans-disciplinary thesis includes research targeted toward detecting biomarkers of ovarian cancer using a unique combination of machine learning and Shapley values. The study explores methods and validates conclusions using the PLCO Ovarian Biomarkers dataset, which exhibits several issues typical of medical datasets, such as small dataset volume, high dimensionality, and significant class imbalance. Several machine learning techniques were investigated and optimised in each step of a general data mining workflow to overcome these issues. Based on a series of experiments, a best-practice pipeline that alleviates the challenges in adopting machine learning methods for ovarian cancer diagnosis and biomarker discovery processes is developed. The pipeline focuses on improving the predictive ability and introducing model-agnostic explanations at a local and global level. Based on analysis and engineering of the data and stable model training, the major achievements of this pipeline include Area Under the Receive Operating Curve (AUC-ROC) scores ranging from 80% to 95%, which are higher than those measured using established methods in the field, such as the Risk of Ovarian MAlignancy (ROMA); and model explanation that allows more intuitive domain interpretation. These explanations guide further research on this topic, including rational method selection, identification of potential risks, and suggested mitigation or alternatives. The research concludes with a straightforward approach to biomarker discovery that is general, and statistically sound, and explains how machine learning models capture the underlying relationship between biomarkers and early signs of disease.
Description
Keywords
Citation
Collections
Source
Type
Thesis (MPhil)
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description
Thesis Material