Cultural advice

The Australian National University acknowledges, celebrates and pays our respects to the Ngunnawal and Ngambri people of the Canberra region and to all First Nations Australians on whose traditional lands we meet and work, and whose cultures are among the oldest continuing cultures in human history.

Aboriginal and Torres Strait Islander peoples are advised that ANU Library collections may include images, names, voices, and other representations of deceased persons.

Material in the collection may contain terms, language or views that reflect the period in which the item was created and may be considered inappropriate today.

Modeling Topic Hierarchies with the Recursive Chinese Restaurant Process

dc.contributor.authorKim, Joon Hee
dc.contributor.authorKim, Dongwoo
dc.contributor.authorKim, Suin
dc.contributor.authorOh, Alice
dc.coverage.spatialMaui, Hawaii, USA
dc.date.accessioned2022-08-11T06:04:58Z
dc.date.created29 October - 2 November 2012
dc.date.issued2012
dc.date.updated2021-08-01T08:31:34Z
dc.description.abstractTopic models such as latent Dirichlet allocation (LDA) and hierarchical Dirichlet processes (HDP) are simple solutions to discover topics from a set of unannotated documents. While they are simple and popular, a major shortcoming of LDA and HDP is that they do not organize the topics into a hierarchical structure which is naturally found in many datasets. We introduce the recursive Chinese restaurant process (rCRP) and a nonparametric topic model with rCRP as a prior for discovering a hierarchical topic structure with unbounded depth and width. Unlike previous models for discovering topic hierarchies, rCRP allows the documents to be generated from a mixture over the entire set of topics in the hierarchy. We apply rCRP to a corpus of New York Times articles, a dataset of MovieLens ratings, and a set of Wikipedia articles and show the discovered topic hierarchies. We compare the predictive power of rCRP with LDA, HDP, and nested Chinese restaurant process (nCRP) using heldout likelihood to show that rCRP outperforms the others. We suggest two metrics that quantify the characteristics of a topic hierarchy to compare the discovered topic hierarchies of rCRP and nCRP. The results show that rCRP discovers a hierarchy in which the topics become more specialized toward the leaves, and topics in the immediate family exhibit more affinity than topics beyond the immediate family.en_AU
dc.description.sponsorshipThis research was supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (2011-0026507).en_AU
dc.format.mimetypeapplication/pdfen_AU
dc.identifier.urihttp://hdl.handle.net/1885/270395
dc.language.isoen_AUen_AU
dc.publisherACMen_AU
dc.relation.ispartofseries21st ACM International conference on Information and Knowledge Managementen_AU
dc.rights© 2012 ACMen_AU
dc.sourceProceedings of the 21st ACM International Conference on Information and Knowledge Managementen_AU
dc.subjectHierarchical Topic Modelingen_AU
dc.subjectBayesian Nonparametric modelsen_AU
dc.titleModeling Topic Hierarchies with the Recursive Chinese Restaurant Processen_AU
dc.typeConference paperen_AU
local.bibliographicCitation.lastpage792en_AU
local.bibliographicCitation.startpage783en_AU
local.contributor.affiliationKim, Joon Hee, KAISTen_AU
local.contributor.affiliationKim, Dongwoo, College of Engineering and Computer Science, ANUen_AU
local.contributor.affiliationKim, Suin, KAISTen_AU
local.contributor.affiliationOh, Alice, KAISTen_AU
local.contributor.authoruidKim, Dongwoo, u1009226en_AU
local.description.embargo2099-12-31
local.description.notesImported from ARIESen_AU
local.identifier.absfor460500 - Data management and data scienceen_AU
local.identifier.ariespublicationu4056230xPUB685en_AU
local.identifier.doi10.1145/2396761.2396861en_AU
local.type.statusPublished Versionen_AU

Downloads

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Modeling Topic Hierarchies.pdf
Size:
847.77 KB
Format:
Adobe Portable Document Format
Description:
abcd