Skip navigation
Skip navigation

Differential topic models

Chen, Changyou; Buntine, Wray; Ding, Nan; Xie, Lexing; Du, Lan

Description

In applications we may want to compare different document collections: they could have shared content but also different and unique aspects in particular collections. This task has been called comparative text mining or cross-collection modeling. We present a differential topic model for this application that models both topic differences and similarities. For this we use hierarchical Bayesian nonparametric models. Moreover, we found it was important to properly model power-law phenomena in...[Show more]

dc.contributor.authorChen, Changyou
dc.contributor.authorBuntine, Wray
dc.contributor.authorDing, Nan
dc.contributor.authorXie, Lexing
dc.contributor.authorDu, Lan
dc.date.accessioned2015-06-02T03:28:53Z
dc.date.available2015-06-02T03:28:53Z
dc.identifier.issn0162-8828
dc.identifier.urihttp://hdl.handle.net/1885/13706
dc.description.abstractIn applications we may want to compare different document collections: they could have shared content but also different and unique aspects in particular collections. This task has been called comparative text mining or cross-collection modeling. We present a differential topic model for this application that models both topic differences and similarities. For this we use hierarchical Bayesian nonparametric models. Moreover, we found it was important to properly model power-law phenomena in topic-word distributions and thus we used the full Pitman-Yor process rather than just a Dirichlet process. Furthermore, we propose the transformed Pitman-Yor process (TPYP) to incorporate prior knowledge such as vocabulary variations in different collections into the model. To deal with the non-conjugate issue between model prior and likelihood in the TPYP, we thus propose an efficient sampling algorithm using a data augmentation technique based on the multinomial theorem. Experimental results show the model discovers interesting aspects of different collections. We also show the proposed MCMC based algorithm achieves a dramatically reduced test perplexity compared to some existing topic models. Finally, we show our model outperforms the state-of-the-art for document classification/ideology prediction on a number of text collections.
dc.description.sponsorshipNICTA was funded by the Australian Government as represented by the Department of Broadband, Communications and the Digital Economy and the Australian Research Council through the ICT Center of Excellence program. Lan Du was supported under Australian Research Council’s Discovery Projects funding Scheme (DP110102506 and DP110102593).
dc.format13 pages
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE)
dc.rights© 2014 IEEE http://www.ieee.org/publications_standards/publications/rights/rights_policies.html © 2014 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works. http://www.ieee.org/publications_standards/publications/rights/announcement_author_posting_updated.pdf The policy reaffirms the principle that authors are free to post the accepted version of their article on their personal web sites or those of their employers. Posting of the final, published PDF continues to be prohibited, except for open access articles, whose authors may freely post the final version. (Publisher's journal website as of 8/9/2015).
dc.sourceIEEE Transactions on Pattern Analysis and Machine Intelligence
dc.subjectdifferential topic model
dc.subjecttransformed Pitman-Yor process
dc.subjectMCMC
dc.subjectdata augmentation
dc.titleDifferential topic models
dc.typeJournal article
local.identifier.citationvolume37
dcterms.dateAccepted2014-03-16
dc.date.issued2015-02
local.identifier.absfor080100 - ARTIFICIAL INTELLIGENCE AND IMAGE PROCESSING
local.identifier.ariespublicationU3488905xPUB5243
local.publisher.urlhttp://www.ieee.org/
local.type.statusAccepted Version
local.contributor.affiliationXie, Lexing, Research School of Computer Science, College of Engineering and Computer Science, The Australian National University
local.contributor.affiliationChen, Changyou, Research School of Computer Science, College of Engineering and Computer Science, The Australian National University
dc.relationhttp://purl.org/au-research/grants/arc/DP110102506
dc.relationhttp://purl.org/au-research/grants/arc/DP110102593
local.identifier.essn1939-3539
local.bibliographicCitation.issue2
local.bibliographicCitation.startpage230
local.bibliographicCitation.lastpage242
local.identifier.doi10.1109/TPAMI.2014.2313127
local.identifier.absseo970108 - Expanding Knowledge in the Information and Computing Sciences
dc.date.updated2015-12-11T09:25:41Z
local.identifier.scopusID2-s2.0-84920936716
CollectionsANU Research Publications

Download

File Description SizeFormat Image
Chen et al Differential topic models 2014.pdf1.53 MBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator