Data-Driven Understanding of Real-Life Moral Dilemmas via Topic Mapping and Moral Foundations
Date
2024
Authors
Nguyen, Josh
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
The emergence of artificial intelligence systems capable of engaging in complex discourse with humans presents both an opportunity and a challenge for automated moral decision-making. While the relevant philosophical literature has largely focused on analyzing idealized moral dilemmas, we instead aim to investigate the patterns of human moral judgment and sentiment observable in large-scale online datasets, which can provide insight into real-world ethical issues. This thesis presents two in-depth studies of moral discussions on social media.
First, we explore the features of everyday ethical conflicts through an analysis of 100,000 discussion threads on Reddit. Using a combination of topic modeling, human validation and crowd-sourced labeling, we discover a set of 47 topics that sufficiently describe the content on this forum. Despite their complex nature, the moral stories in this dataset can be represented by very nominally neutral topics such as money, work, appearance and communication, suggesting a nuanced view of morality in daily life. Importantly, people tend to perceive each moral story with two topics---like family and money---giving rise to a rich thematic space of over 1,000 topic pairs throughout this discussion sphere. Downstream results suggest that topics and topic pairs can serve as an important covariate in examining how a moral story is framed and how its corresponding judgment is made.
Second, we analyze moral controversies on social media through the lens of moral foundations theory, a taxonomy of five fundamental moral intuitions. This theory is widely used in data-driven studies of online content, but existing methods used to detect moral foundations are surprisingly lacking in their consistency and cross-domain generalizability. In response, we fine-tune a large language model to measure moral foundations in text based on datasets covering news media and online discussions. The resulting model, which we call Mformer, consistently outperforms current approaches across several in- and out-of-domain benchmarks, improving from the state-of-the-art by up to 17% in the AUC metric. Using Mformer to analyze Reddit and Twitter content, we discover that the relative importance of moral foundations can meaningfully describe people's stance on many social issues, and such variations are topic-dependent.
Altogether, these studies demonstrate the utility of a data-driven approach to practical ethics. In particular, modern text corpora capture an unprecedented record of human discourse on contemporary social and ethical issues. A topic- or moral foundation-driven approach to analyzing these data sources can prove useful by representing complex moral discussions in a low-dimensional and interpretable manner. This may enable researchers to meaningfully examine the nuanced but diverse patterns of human moral judgment and sentiment in everyday life. Methods and findings from this thesis can be used to inform the development of automated systems which engage in content filtering, dissemination and moderation, and, ultimately, moral conversations with humans.
Description
Keywords
Citation
Collections
Source
Type
Thesis (MPhil)
Book Title
Entity type
Access Statement
License Rights
Restricted until
Downloads
File
Description
Thesis Material