Improving Topic Coherence with Regularized Topic Models

Newman, David; Bonilla, Edwin; Buntine, Wray

Improving Topic Coherence with Regularized Topic Models

Date

2011

Authors

Newman, David

Bonilla, Edwin

Buntine, Wray

Publisher

Neural Information Processing Systems Foundation

Abstract

Topic models have the potential to improve search and browsing by extracting useful semantic themes from web pages and other text documents. When learned topics are coherent and interpretable, they can be valuable for faceted browsing, results set diversity analysis, and document retrieval. However, when dealing with small collections or noisy text (e.g. web search result snippets or blog posts), learned topics can be less coherent, less interpretable, and less useful. To overcome this, we propose two methods to regularize the learning of topic models. Our regularizers work by creating a structured prior over words that reflect broad patterns in the external data. Using thirteen datasets we show that both regularizers improve topic coherence and interpretability while learning a faithful representation of the collection of interest. Overall, this work makes topic models more useful across a broader range of text data.

Keywords

Keywords: Data sets; Diversity analysis; Document Retrieval; Faceted browsing; Interpretability; Text data; Text document; Topic model; Web searches; Semantics; Websites; Information retrieval

URI

http://hdl.handle.net/1885/37239

Collections

ANU Research Publications

Source

Advances in Neural Information Processing Systems 23

Type

Conference paper

Restricted until

2037-12-31

Downloads

File

Description

01_Newman_Improving_Topic_Coherence_with_2011.pdf (211.84 KB)

Full item page

Cultural advice

Improving Topic Coherence with Regularized Topic Models

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Source

Type

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until

Downloads