Dimension Reduction and Data Augmentation Methods for the Physical Sciences

Loading...
Thumbnail Image

Date

Authors

Liu, Tommy

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Data is fundamental to how we understand the world around us. It is using data in which we develop understanding and interact with our surroundings. With the increasing volume of data and increasingly data driven world, the development of Machine Learning tools to analyse diverse and complex data are becoming more important. In the Physical Sciences, data typically contains many observed characteristics yet at the same time a low number of total observations because of the cost associated with gathering such data. One example is in that of Molecular Physics, where highly complex systems of equations must be simulated, or expensive machinery must be used to observe a given nanoparticle. This work seeks develop tools and workflows in order to provide effective methods to analyse this sort of data. Dimension reduction and data augmentation are two well known methods for solving problems associated with learning using high dimensional data. We demonstrate workflows which make use of well known methodologies such as PCA for dimension reduction and SMOGN for data augmentation. There are however particular requirements associated with data analysis in different fields. We therefore consider the fitness of these methods in context and provide alternative methods to analyse this data with. Our methodologies demonstrate significant advantages depending on the requirements of a given analysis task. We introduce Hyper-Dimension Reduction methods which significantly outperform PCA when using our tested learning models. When data transformations such as PCA are not sufficiently interpretable, feature selection may be used which retains all the meanings of the original data. However, state of the art feature selection algorithms scale poorly with the number of features, we introduce a method which significantly reduces the computational complexity of the feature selection task on any data. We demonstrate that data augmentation can increase the stability of models by providing regularisation however existing methods retain understanding of the original data poorly. We explore the idea of using inherent errors in the data in order to carry out the data augmentation task.

Description

Citation

Source

Book Title

Entity type

Access Statement

License Rights

Restricted until

Downloads