Adaptive Recommendations with Bandit Feedback

Zhang, Mengyan

Adaptive Recommendations with Bandit Feedback

dc.contributor.author	Zhang, Mengyan
dc.date.accessioned	2023-02-02T02:48:03Z
dc.date.available	2023-02-02T02:48:03Z
dc.date.issued	2023
dc.description.abstract	Sequential decision-making under uncertainty is a fundamental challenge in machine learning. We consider multi-armed bandits problems, where the goal is to design a policy to recommend arms (options) sequentially where in each round only the rewards sampled from the selected arms are available. In this thesis, we study both theoretical and practical aspects of sequential decision-making with bandit feedback. From the theory aspect, we summarise the rewards using quantiles for risk-averse decision-making and selecting a local area of arm space with only aggregated feedback for a new aggregated regret. From the practical side, we study the best arms identification for biological sequence design in real-world applications and focus on the regret minimisation setting for large-scale news recommendations. There are three design choices one can make in bandit tasks, namely arms, objectives and rewards. The theoretical and practical contributions of this thesis cover and enrich the three design choices. We first consider the design choice of arm optimality and propose to use quantiles to summarise the reward distributions. This is under the task of best arms identification with a fixed budget for risk-averse decision-making. With new concentration inequalities for both order statistics and quantiles, we propose Quantile Successive Accepts and Rejects (Q-SAR) algorithm and provide an upper bound of the probability of finding the suboptimal arms. By assuming there is a reward function over the arm (feature) space and making assumptions about the smoothness of the reward function, we consider a real-world application in synthetic biology, where we design biological sequences in batches to optimise the corresponding protein expression level. By proposing a machine learning guided Design-Build-Test-Learn cycle and utilising the Gaussian process batch upper confidence bound algorithm, we improve the protein expression level by up to 35%. Motivated by the large arm space in applications such as large-scale recommender systems and biological sequence design, we design algorithms to explore arms hierarchically. Under the assumption of Gaussian process rewards, we consider a novel setting where only the aggregated feedback of a local area can be observed, instead of the reward of each selected choice. This setting is also useful when the individual feedback is not available due to for example privacy or hardware constraints. We propose a new algorithm called Gaussian Process Optimistic Optimisation (GPOO) to address this new setting and provided an upper bound of the aggregated regret. We then consider another type of hierarchical design for the application of recommendation systems, where we propose a two-stage topic-item recommendation framework for news recommendation. We further propose policies to utilise the neural representations and conduct large-scale experiments on a news recommendation dataset, which verifies the efficiency of the two-stage framework and our proposed policies.
dc.identifier.uri	http://hdl.handle.net/1885/284132
dc.language.iso	en_AU
dc.title	Adaptive Recommendations with Bandit Feedback
dc.type	Thesis (PhD)
local.contributor.affiliation	ANU College of Engineering Computing & Cybernetics, The Australian National University
local.contributor.authoremail	u6015325@anu.edu.au
local.contributor.supervisor	Ong, Cheng
local.contributor.supervisorcontact	u1823069@anu.edu.au
local.identifier.doi	10.25911/ER4Y-K363
local.identifier.proquest	Yes
local.mintdoi	mint
local.thesisANUonly.author	93e8b0e8-a583-45c5-9a2d-e5e360a0d201
local.thesisANUonly.key	8c52df59-d9dc-26bd-bc7d-27ae644a5bc1
local.thesisANUonly.title	000000020646_TC_1

Downloads

Original bundle

Now showing 1 - 1 of 1

Name:: MengyanZhang_thesis_final.pdf
Size:: 11.59 MB
Format:: Adobe Portable Document Format
Description:: Thesis Material

Download

Collections

Open Access Theses