Improving Topic Coherence with Regularized Topic Models

Newman, David; Bonilla, Edwin; Buntine, Wray

A change is coming. Click to see a sneak peek of the new Open Research Repository.

Improving Topic Coherence with Regularized Topic Models

Request a Copy

link to publisher version

Newman, David; Bonilla, Edwin; Buntine, Wray

Description

Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we...[Show more] propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data.

dc.contributor.author	Newman, David
dc.contributor.author	Bonilla, Edwin
dc.contributor.author	Buntine, Wray
dc.coverage.spatial	Granada Spain
dc.date.accessioned	2015-12-08T22:43:21Z
dc.date.created	December 13-15 2011
dc.identifier.uri	http://hdl.handle.net/1885/37239
dc.description.abstract	Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data.
dc.publisher	Neural Information Processing Systems Foundation
dc.relation.ispartofseries	Neural Information Processing Systems (NIPS 2011)
dc.source	Advances in Neural Information Processing Systems 23
dc.source.uri	http://papers.nips.cc/book/advances-in-neural-information-processing-systems-24-2011
dc.subject	Keywords: Data sets; Diversity analysis; Document Retrieval; Faceted browsing; Interpretability; Text data; Text document; Topic model; Web searches; Semantics; Websites; Information retrieval
dc.title	Improving Topic Coherence with Regularized Topic Models
dc.type	Conference paper
local.description.notes	Imported from ARIES
local.description.refereed	Yes
dc.date.issued	2011
local.identifier.absfor	080203 - Computational Logic and Formal Languages
local.identifier.ariespublication	u4963866xPUB146
local.type.status	Published Version
local.contributor.affiliation	Newman, David, University of California
local.contributor.affiliation	Bonilla, Edwin, College of Engineering and Computer Science, ANU
local.contributor.affiliation	Buntine, Wray, College of Engineering and Computer Science, ANU
local.description.embargo	2037-12-31
local.bibliographicCitation.startpage	9
local.identifier.absseo	890299 - Computer Software and Services not elsewhere classified
dc.date.updated	2016-02-24T11:30:02Z
local.identifier.scopusID	2-s2.0-84860615846
Collections	ANU Research Publications

Download

File	Description	Size	Format	Image
01_Newman_Improving_Topic_Coherence_with_2011.pdf		211.84 kB	Adobe PDF	Request a copy

Show simple item record