Decisions, Learning and Games: You've Got To Have Freedom.

Della Penna, Nicolas

Decisions, Learning and Games: You've Got To Have Freedom.

Date

2022

Authors

Della Penna, Nicolas

Abstract

Maintaining a subject's freedom to decide imposes structure and constraints on learning systems that aim to guide those decisions. Two natural sources from which subjects can learn to make good decisions are past experiences and advice from others. Both are affected by the subject's freedom to ultimately act as they wish, giving rise to learning theoretic and game theoretic repercussions respectively. To study the effect of past experiences, we extend the standard bandit setting: after the algorithm chooses an action, the subject may actually carry out a different action. This is then observed along with the reward. Algorithms whose choice of action is mediated by the subject can gain from awareness of the subject's actual actions, which we term compliance awareness. We present algorithms that take advantage of compliance awareness, while maintaining worst case regret bounds up to multiplicative constants. We study their empirical finite sample performance on synthetic data and simulations using real data from clinical trials. To study the effect of advice of others, we consider the literature on incentives for multiple experts by a decision maker that will take an action and receive a reward about which the experts may have information. Existing mechanisms for multiple experts are known not to be truthful, even in the limited sense of myopic incentive compatibility, unless the decision maker renounces their ability to always take on the best ex-post action and commits to a randomized strategy with full support. We present a new class of mechanisms based on second price auctions that maintain the subject's freedom. Experts submit their private information, and the algorithm auctions off the rights to a share of the reward of the subject, who then has freedom to pick the action they desire after observing the submitted information. We show several situations in which existing mechanisms fail and this one succeeds. We also consider strategic limitations of this mechanism beyond the myopic setting that arise due to complementary information between experts, and practical considerations in its implementation in real institutions. We conclude by considering a natural hybrid setting, where a sequence of subjects make decisions and each can receive advice from a fixed set of experts that the mechanism seeks to incentivize. The model for this setting is extremely general, having as special cases standard, compliance aware and contextual bandits, as well as decision markets. We present a novel practical market structure for this setting that incentivizes exploration, information revelation, and aggregation with selfish experts.