Vanishing Gradient Problem in Training Neural Networks

dc.contributor.authorChen, Muye
dc.date.accessioned2022-07-03T23:08:53Z
dc.date.available2022-07-03T23:08:53Z
dc.date.issued2022
dc.description.abstractThe Vanishing Gradient Problem (VGP) is a frequently encountered numerical problem in training Feedforward Neural Networks (FNN) and Recurrent Neural Networks (RNN). The gradient involved in neural network optimisation can vanish and become zero in a number of ways. In this thesis we focus on the following definition of the VGP: the tendency for network loss gradients, calculated with respect to the model weight parameters, to vanish numerically in the back propagation step of network training. Due to the differences in data types on which the two types of networks are trained, the model architectures are different. Consequently the methods to alleviate the problem take different forms and focus on different model components. This thesis attempts to introduce basic neural network model architectures to readers who are new to deep learning and using neural networks, with a particular focus on how the VGP can affect the model performance. We conduct the relevant research of RNNs in the context of a simple classification task in the field of Natural Language Processing (NLP). We have im- implemented and analysed the existing solutions to the VGP through mathematical details and graphical results from experiments. The thesis is extended to two types of advanced RNN-class models which are designed to be resistant to the VGP. However our experimental results reveal that under a strict indicator, the two advanced models instead exhibit stronger tendency of the VGP, than the standard RNN model does. With regards to this finding, we introduce a different viewpoint proposed by Rehmer & Kroll (2020) regarding the VGP in RNN-class models, which support our results. We have discussed its relevance to our experiments and extended their derivations to another RNN-class model they have not covered.en_AU
dc.identifier.urihttp://hdl.handle.net/1885/268662
dc.language.isoen_AUen_AU
dc.subjectNeural networksen_AU
dc.subjectrecurrent neural networksen_AU
dc.subjectvanishing gradient problemen_AU
dc.titleVanishing Gradient Problem in Training Neural Networksen_AU
dc.typeThesis (Honours)en_AU
dcterms.valid2022en_AU
local.contributor.affiliationResearch School of Finance, Actuarial Studies and Statistics, Australian National Universityen_AU
local.contributor.supervisorWood, Andrew
local.description.notesDeposited by author 3/7/2022en_AU
local.identifier.doi10.25911/BMDG-3N85
local.mintdoiminten_AU
local.type.degreeThesis (Honours)en_AU

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
6614681_thesis.pdf
Size:
2.07 MB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
884 B
Format:
Item-specific license agreed upon to submission
Description: