Skip navigation
Skip navigation

Learning for the internet : kernel embeddings and optimisation

Quadrianto, Novi

Description

In this thesis, we develop principled machine learning methods suited for complex real-world Internet challenges. The Internet has supplied an unprecedented amount of data; the challenge now is to transform this massive amount of data into information that supports knowledge creation. Machine learning techniques have become prevalent for modelling, prediction, and decision making from massive scale data. This thesis makes contributions in addressing data to knowledge transformation in the...[Show more]

dc.contributor.authorQuadrianto, Novi
dc.date.accessioned2018-11-22T00:07:43Z
dc.date.available2018-11-22T00:07:43Z
dc.date.copyright2011
dc.identifier.otherb2638849
dc.identifier.urihttp://hdl.handle.net/1885/151273
dc.description.abstractIn this thesis, we develop principled machine learning methods suited for complex real-world Internet challenges. The Internet has supplied an unprecedented amount of data; the challenge now is to transform this massive amount of data into information that supports knowledge creation. Machine learning techniques have become prevalent for modelling, prediction, and decision making from massive scale data. This thesis makes contributions in addressing data to knowledge transformation in the context of machine learning; we introduce non-standard machine learning problems and devise solutions for those, as well we present scalable solutions for several existing machine learning problems. The present work focuses on addressing Internet complexity on output label dimensions. The first part of this thesis deals with formulation and solution of non-standard machine learning problems. Traditionally, supervised machine learning settings draw inference and make prediction from a set of input objects; each of which is supervised by a desired output value. Internet poses challenges of weak label supervision and label inconsistency. We focus on the following three new settings: 1. Learning from only label proportions: A learning setting where instead of each input is supervised with an output, we are given groups of unlabelled inputs. Each group is endowed with information on class label proportions. The number of group is at least as many as number of labels (Chapter 3); 2. Learning input-output correspondences: A learning setting where a set of inputs and a set of outputs are given however they are not paired (Chapter 4); 3. Learning from several related tasks with distinct label sets: A learning setting where several related tasks are given however each task has potentially distinct label sets and label correspondences are not readily available (Chapter 5). The second part addresses refinements of existing machine learning models and algorithms to scale to large data. The contributions of this thesis include a streaming algorithm for the following two problems: 1. Transductive learning: We present a scalable algorithm for learning with labelled and unlabelled data simultaneously by matching the output distributions on labelled and unlabelled data (Chapter 6); 2. Storage and indexing management: We present a scalable algorithm for indexing in the context of webpage tiering. The goal is to allocate pages to caches such that the most frequently accessed pages reside in the caches with the smallest latency whereas the least frequently retrieved pages are stored in the backtiers of the caching system. This indexing and storage problem is related to a larger class of parametric flow problem (Chapter 7). The solutions presented in this thesis are centred around two main mathematical ingredients. First, Hilbert Space embeddings of distributions via averages are used. This allows distance computation between data distributions in terms of distances between averages, which, in turn, yield elegant ways to deal with distributions without the need of estimating them as an intermediate step. Second, recent advances in field of optimisation are exploited to address the sheer size and the non-convex nature of mostly Internet problems.
dc.format.extentxix, 162 leaves.
dc.language.isoen_AU
dc.rightsAuthor retains copyright
dc.subject.lccQ325.75 .Q83 2011
dc.subject.lcshSupervised learning (Machine learning)
dc.subject.lcshKernel functions
dc.subject.lcshMathematical optimization
dc.titleLearning for the internet : kernel embeddings and optimisation
dc.typeThesis (PhD)
local.description.notesThesis (Ph.D.)--Australian National University
dc.date.issued2011
local.type.statusAccepted Version
local.contributor.affiliationAustralian National University.
local.identifier.doi10.25911/5d5153844b4dd
dc.date.updated2018-11-21T08:21:37Z
dcterms.accessRightsOpen Access
local.mintdoimint
CollectionsOpen Access Theses

Download

File Description SizeFormat Image
b26388492_Quadrianto_Novi.pdf42.29 MBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator