Learning with Similarities on Subsets
Download (1.19 MB)

Altmetric Citations
Description
Many machine learning models are based on similarities between new examples and previously observed examples. Such models are extremely flexible and can adapt to a wide range of tasks. However, if examples are composed of many variables, then even if we collect a large number of examples, it is possible that no two examples will be significantly similar. This, in turn, means that a learning algorithm may require an unreasonably large number of examples to...[Show more]
dc.contributor.author  Ruderman, Avraham Pinchas  

dc.date.accessioned  20171023T23:09:37Z  
dc.date.available  20171023T23:09:37Z  
dc.identifier.other  b47392654  
dc.identifier.uri  http://hdl.handle.net/1885/131951  
dc.description.abstract  Many machine learning models are based on similarities between new examples and previously observed examples. Such models are extremely flexible and can adapt to a wide range of tasks. However, if examples are composed of many variables, then even if we collect a large number of examples, it is possible that no two examples will be significantly similar. This, in turn, means that a learning algorithm may require an unreasonably large number of examples to learn such models. In this thesis, however, we show that a small modification of such models, using sim ilarities on subsets of the variables, rather than similarities on all variables, can allow us to learn using fewer examples and without a large increase in the required com putational complexity, if two conditions hold. The first condition is that similarities on certain subsets are indicative of the targets. The second condition is that the vari ables can be partitioned into a small number of groups such that the variables within each group are highly dependent. For instance, suppose we want to learn a classifier which takes images of handwritten digits and outputs the digit drawn in the image. We would like to learn such a classifier based on example images. If we measure similarity to a new example image, depicting a 4, it is quite possible that many of the most similar previous images we have depict a 9. However, if we measure similarity on certain subsets of the pixels, for example, on the region corresponding to the top third of the image, then on this subset of the pixels, it is unlikely that an image depicting a 4 will be similar to an image depicting a 9. For such a task, a learning algorithm based on measuring similarities to previous examples would be able to learn a more accurate classifier, from fewer examples, by using similarities on subsets. But how can the learning algorithm efficiently find suitable subsets of variables on which to measure similarity? This seems like a daunting task, as the number of possible subsets grows exponentially in the number of variables. In this thesis, we suggest that if similarities on certain subsets are indicative of the targets, and if the variables can be partitioned into a small number of groups, such that the variables within each group are highly dependent, then one can find appropriate subsets on which to measure similarity, by considering combinations of groups of highly de pendent variables. The small number of groups leads to a small number of subsets we need to consider and hence to a manageable computational effort. The high dependence of the variables within each group allows us to achieve much of the re duction in the number of examples that can be achieved by measuring similarities on subsets, even though we are only considering a small portion of all possible subsets.  
dc.language.iso  en  
dc.subject  Machine learning  
dc.subject  Learning theory  
dc.subject  Learning with similarities  
dc.subject  Similarities on subsets  
dc.title  Learning with Similarities on Subsets  
dc.type  Thesis (PhD)  
local.contributor.supervisor  Reid, Mark  
local.contributor.supervisorcontact  mark.reid@anu.edu.au  
dcterms.valid  2017  
local.description.notes  the author deposited 24/10/2017  
local.type.degree  Doctor of Philosophy (PhD)  
dc.date.issued  2015  
local.contributor.affiliation  College of Engineering & Computer Science, The Australian National University  
local.identifier.doi  10.25911/5d70f261ca3c5  
local.mintdoi  mint  
Collections  Open Access Theses 
Download
File  Description  Size  Format  Image 

Ruderman Thesis 2017.pdf  1.19 MB  Adobe PDF 
Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.
Updated: 17 November 2022/ Responsible Officer: University Librarian/ Page Contact: Library Systems & Web Coordinator