Skip navigation
Skip navigation

Learning with Similarities on Subsets

Ruderman, Avraham Pinchas

Description

Many machine learning models are based on similarities between new examples and previously observed examples. Such models are extremely flexible and can adapt to a wide range of tasks. However, if examples are composed of many variables, then even if we collect a large number of examples, it is possible that no two examples will be significantly similar. This, in turn, means that a learning algorithm may require an unreasonably large number of examples to...[Show more]

dc.contributor.authorRuderman, Avraham Pinchas
dc.date.accessioned2017-10-23T23:09:37Z
dc.date.available2017-10-23T23:09:37Z
dc.identifier.otherb47392654
dc.identifier.urihttp://hdl.handle.net/1885/131951
dc.description.abstractMany machine learning models are based on similarities between new examples and previously observed examples. Such models are extremely flexible and can adapt to a wide range of tasks. However, if examples are composed of many variables, then even if we collect a large number of examples, it is possible that no two examples will be significantly similar. This, in turn, means that a learning algorithm may require an unreasonably large number of examples to learn such models. In this thesis, however, we show that a small modification of such models, using sim- ilarities on subsets of the variables, rather than similarities on all variables, can allow us to learn using fewer examples and without a large increase in the required com- putational complexity, if two conditions hold. The first condition is that similarities on certain subsets are indicative of the targets. The second condition is that the vari- ables can be partitioned into a small number of groups such that the variables within each group are highly dependent. For instance, suppose we want to learn a classifier which takes images of handwritten digits and outputs the digit drawn in the image. We would like to learn such a classifier based on example images. If we measure similarity to a new example image, depicting a 4, it is quite possible that many of the most similar previous images we have depict a 9. However, if we measure similarity on certain subsets of the pixels, for example, on the region corresponding to the top third of the image, then on this subset of the pixels, it is unlikely that an image depicting a 4 will be similar to an image depicting a 9. For such a task, a learning algorithm based on measuring similarities to previous examples would be able to learn a more accurate classifier, from fewer examples, by using similarities on subsets. But how can the learning algorithm efficiently find suitable subsets of variables on which to measure similarity? This seems like a daunting task, as the number of possible subsets grows exponentially in the number of variables. In this thesis, we suggest that if similarities on certain subsets are indicative of the targets, and if the variables can be partitioned into a small number of groups, such that the variables within each group are highly dependent, then one can find appropriate subsets on which to measure similarity, by considering combinations of groups of highly de- pendent variables. The small number of groups leads to a small number of subsets we need to consider and hence to a manageable computational effort. The high dependence of the variables within each group allows us to achieve much of the re- duction in the number of examples that can be achieved by measuring similarities on subsets, even though we are only considering a small portion of all possible subsets.
dc.language.isoen
dc.subjectMachine learning
dc.subjectLearning theory
dc.subjectLearning with similarities
dc.subjectSimilarities on subsets
dc.titleLearning with Similarities on Subsets
dc.typeThesis (PhD)
local.contributor.supervisorReid, Mark
local.contributor.supervisorcontactmark.reid@anu.edu.au
dcterms.valid2017
local.description.notesthe author deposited 24/10/2017
local.type.degreeDoctor of Philosophy (PhD)
dc.date.issued2015
local.contributor.affiliationCollege of Engineering & Computer Science, The Australian National University
local.identifier.doi10.25911/5d70f261ca3c5
local.mintdoimint
CollectionsOpen Access Theses

Download

File Description SizeFormat Image
Ruderman Thesis 2017.pdf1.19 MBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator