Vanishing Gradient Problem in Training Neural Networks
dc.contributor.author | Chen, Muye | |
dc.date.accessioned | 2022-07-03T23:08:53Z | |
dc.date.available | 2022-07-03T23:08:53Z | |
dc.date.issued | 2022 | |
dc.description.abstract | The Vanishing Gradient Problem (VGP) is a frequently encountered numerical problem in training Feedforward Neural Networks (FNN) and Recurrent Neural Networks (RNN). The gradient involved in neural network optimisation can vanish and become zero in a number of ways. In this thesis we focus on the following definition of the VGP: the tendency for network loss gradients, calculated with respect to the model weight parameters, to vanish numerically in the back propagation step of network training. Due to the differences in data types on which the two types of networks are trained, the model architectures are different. Consequently the methods to alleviate the problem take different forms and focus on different model components. This thesis attempts to introduce basic neural network model architectures to readers who are new to deep learning and using neural networks, with a particular focus on how the VGP can affect the model performance. We conduct the relevant research of RNNs in the context of a simple classification task in the field of Natural Language Processing (NLP). We have im- implemented and analysed the existing solutions to the VGP through mathematical details and graphical results from experiments. The thesis is extended to two types of advanced RNN-class models which are designed to be resistant to the VGP. However our experimental results reveal that under a strict indicator, the two advanced models instead exhibit stronger tendency of the VGP, than the standard RNN model does. With regards to this finding, we introduce a different viewpoint proposed by Rehmer & Kroll (2020) regarding the VGP in RNN-class models, which support our results. We have discussed its relevance to our experiments and extended their derivations to another RNN-class model they have not covered. | en_AU |
dc.identifier.uri | http://hdl.handle.net/1885/268662 | |
dc.language.iso | en_AU | en_AU |
dc.subject | Neural networks | en_AU |
dc.subject | recurrent neural networks | en_AU |
dc.subject | vanishing gradient problem | en_AU |
dc.title | Vanishing Gradient Problem in Training Neural Networks | en_AU |
dc.type | Thesis (Honours) | en_AU |
dcterms.valid | 2022 | en_AU |
local.contributor.affiliation | Research School of Finance, Actuarial Studies and Statistics, Australian National University | en_AU |
local.contributor.supervisor | Wood, Andrew | |
local.description.notes | Deposited by author 3/7/2022 | en_AU |
local.identifier.doi | 10.25911/BMDG-3N85 | |
local.mintdoi | mint | en_AU |
local.type.degree | Thesis (Honours) | en_AU |