Skip navigation
Skip navigation

Nonparametric Bayesian Topic Modelling with Auxiliary Data

Lim, Kar Wai

Description

The intent of this dissertation in computer science is to study topic models for text analytics. The first objective of this dissertation is to incorporate auxiliary information present in text corpora to improve topic modelling for natural language processing (NLP) applications. The second objective of this dissertation is to extend existing topic models to employ state-of-the-art nonparametric Bayesian techniques for better modelling of text data. In...[Show more]

dc.contributor.authorLim, Kar Wai
dc.date.accessioned2016-08-09T02:14:49Z
dc.date.available2016-08-09T02:14:49Z
dc.identifier.otherb39905962
dc.identifier.urihttp://hdl.handle.net/1885/107151
dc.description.abstractThe intent of this dissertation in computer science is to study topic models for text analytics. The first objective of this dissertation is to incorporate auxiliary information present in text corpora to improve topic modelling for natural language processing (NLP) applications. The second objective of this dissertation is to extend existing topic models to employ state-of-the-art nonparametric Bayesian techniques for better modelling of text data. In particular, this dissertation focusses on: - incorporating hashtags, mentions, emoticons, and target-opinion dependency present in tweets, together with an external sentiment lexicon, to perform opinion mining or sentiment analysis on products and services; - leveraging abstracts, titles, authors, keywords, categorical labels, and the citation network to perform bibliographic analysis on research publications, using a supervised or semi-supervised topic model; and - employing the hierarchical Pitman-Yor process (HPYP) and the Gaussian process (GP) to jointly model text, hashtags, authors, and the follower network in tweets for corpora exploration and summarisation. In addition, we provide a framework for implementing arbitrary HPYP topic models to ease the development of our proposed topic models, made possible by modularising the Pitman-Yor processes. Through extensive experiments and qualitative assessment, we find that topic models fit better to the data as we utilise more auxiliary information and by employing the Bayesian nonparametric method.
dc.language.isoen
dc.subjectBayesian nonparametric
dc.subjecttopic modelling
dc.subjecthierarchical Pitman-Yor process
dc.titleNonparametric Bayesian Topic Modelling with Auxiliary Data
dc.typeThesis (PhD)
local.contributor.supervisorBuntine, Wray
local.contributor.supervisorcontactwray.buntine@monash.edu
dcterms.valid2016
local.type.degreeDoctor of Philosophy (PhD)
dc.date.issued2016
local.contributor.affiliationResearch School of Computer Science, College of Engineering and Computer Science, The Australian National University
local.identifier.doi10.25911/5d778a7858d02
local.mintdoimint
CollectionsOpen Access Theses

Download

File Description SizeFormat Image
Lim Thesis 2016.pdf2.5 MBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator