Vanishing Gradient Problem in Training Neural Networks

Chen, Muye

Vanishing Gradient Problem in Training Neural Networks

dc.contributor.author	Chen, Muye
dc.date.accessioned	2022-07-03T23:08:53Z
dc.date.available	2022-07-03T23:08:53Z
dc.date.issued	2022
dc.description.abstract	The Vanishing Gradient Problem (VGP) is a frequently encountered numerical problem in training Feedforward Neural Networks (FNN) and Recurrent Neural Networks (RNN). The gradient involved in neural network optimisation can vanish and become zero in a number of ways. In this thesis we focus on the following definition of the VGP: the tendency for network loss gradients, calculated with respect to the model weight parameters, to vanish numerically in the back propagation step of network training. Due to the differences in data types on which the two types of networks are trained, the model architectures are different. Consequently the methods to alleviate the problem take different forms and focus on different model components. This thesis attempts to introduce basic neural network model architectures to readers who are new to deep learning and using neural networks, with a particular focus on how the VGP can affect the model performance. We conduct the relevant research of RNNs in the context of a simple classification task in the field of Natural Language Processing (NLP). We have im- implemented and analysed the existing solutions to the VGP through mathematical details and graphical results from experiments. The thesis is extended to two types of advanced RNN-class models which are designed to be resistant to the VGP. However our experimental results reveal that under a strict indicator, the two advanced models instead exhibit stronger tendency of the VGP, than the standard RNN model does. With regards to this finding, we introduce a different viewpoint proposed by Rehmer & Kroll (2020) regarding the VGP in RNN-class models, which support our results. We have discussed its relevance to our experiments and extended their derivations to another RNN-class model they have not covered.	en_AU
dc.identifier.uri	http://hdl.handle.net/1885/268662
dc.language.iso	en_AU	en_AU
dc.subject	Neural networks	en_AU
dc.subject	recurrent neural networks	en_AU
dc.subject	vanishing gradient problem	en_AU
dc.title	Vanishing Gradient Problem in Training Neural Networks	en_AU
dc.type	Thesis (Honours)	en_AU
dcterms.valid	2022	en_AU
local.contributor.affiliation	Research School of Finance, Actuarial Studies and Statistics, Australian National University	en_AU
local.contributor.supervisor	Wood, Andrew
local.description.notes	Deposited by author 3/7/2022	en_AU
local.identifier.doi	10.25911/BMDG-3N85
local.mintdoi	mint	en_AU
local.type.degree	Thesis (Honours)	en_AU

Downloads

Original bundle

Now showing 1 - 1 of 1

Name:: 6614681_thesis.pdf
Size:: 2.07 MB
Format:: Adobe Portable Document Format
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 884 B
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Open Access Theses