Skip navigation
Skip navigation

Improving Topic Coherence with Regularized Topic Models

Newman, David; Bonilla, Edwin; Buntine, Wray

Description

Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we...[Show more]

dc.contributor.authorNewman, David
dc.contributor.authorBonilla, Edwin
dc.contributor.authorBuntine, Wray
dc.coverage.spatialGranada Spain
dc.date.accessioned2015-12-08T22:43:21Z
dc.date.createdDecember 13-15 2011
dc.identifier.urihttp://hdl.handle.net/1885/37239
dc.description.abstractTopic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data.
dc.publisherNeural Information Processing Systems Foundation
dc.relation.ispartofseriesNeural Information Processing Systems (NIPS 2011)
dc.sourceAdvances in Neural Information Processing Systems 23
dc.source.urihttp://papers.nips.cc/book/advances-in-neural-information-processing-systems-24-2011
dc.subjectKeywords: Data sets; Diversity analysis; Document Retrieval; Faceted browsing; Interpretability; Text data; Text document; Topic model; Web searches; Semantics; Websites; Information retrieval
dc.titleImproving Topic Coherence with Regularized Topic Models
dc.typeConference paper
local.description.notesImported from ARIES
local.description.refereedYes
dc.date.issued2011
local.identifier.absfor080203 - Computational Logic and Formal Languages
local.identifier.ariespublicationu4963866xPUB146
local.type.statusPublished Version
local.contributor.affiliationNewman, David, University of California
local.contributor.affiliationBonilla, Edwin, College of Engineering and Computer Science, ANU
local.contributor.affiliationBuntine, Wray, College of Engineering and Computer Science, ANU
local.description.embargo2037-12-31
local.bibliographicCitation.startpage9
local.identifier.absseo890299 - Computer Software and Services not elsewhere classified
dc.date.updated2016-02-24T11:30:02Z
local.identifier.scopusID2-s2.0-84860615846
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Newman_Improving_Topic_Coherence_with_2011.pdf211.84 kBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator