Skip navigation
Skip navigation

Automatic extraction of topic hierarchies based on WordNet

dc.contributor.authorBrey, Gerhard
dc.contributor.authorVieira, Miguel
dc.date.accessioned2012-04-17T23:39:36Z
dc.date.available2012-04-17T23:39:36Z
dc.date.created2012-03
dc.identifier.citationBrey, G. & Vieira, M. (March 2012). Automatic extraction of topic hierarchies based on WordNet. Presentation at the Digital Humanities Australasia 2012: Building, Mapping, Connecting [Conference][aaDH2012]. Canberra, Australia: ANU
dc.identifier.urihttp://hdl.handle.net/1885/8990
dc.description.abstractThe aim of the research described here is the automatic generation of a topic hierarchy, using WordNet as the basis for a faceted browser interface, with a collection of 19th-century periodical texts as the test corpus. Our research was motivated by the Castanet algorithm, which was developed and successfully applied to short descriptions of documents. In our research we adapt the algorithm so that it can be applied to the full text of documents. The algorithm for the automatic generation of the topic hierarchy has three main processes: Data preparation, wherein data is prepared so that the information contained within the texts is more easily accessible; Target term extraction, wherein terms that are considered relevant to classify each text are selected, and; Topic tree generation, wherein the tree is built using the target terms. We evaluated samples of the resulting topic tree and found that over 90% of the topics are relevant, i.e. they clearly illustrate what the articles are about and the topic hierarchy adequately relates to the content of the articles. Future work will address problems resulting from mis‐OCRed words, erroneous disambiguation, and language anachronisms. Faceted browsing interfaces based on topic hierarchies are easy and intuitive to navigate, and as our results demonstrate, topic hierarchies form an appropriate basis for this type of data navigation. We are confident that our approach can successfully be applied to other corpora and should yield even better results if there are no OCR issues to contend with. Since WordNet is available in several languages, it should also be possible to apply our approach to corpora in other languages.
dc.description.sponsorshipAustralian Academy of the Humanities; the ANU College of Arts and Social Sciences
dc.format.extent20 slides
dc.format.mimetypeapplication/pdf
dc.language.isoen_AU
dc.publisherAustralasian Association for Digital Humanities
dc.relation.ispartofAustralasian Association for Digital Humanities Conference (1st : 2012 : The Australian National University, Canberra, ACT)
dc.rightsAuthor/s retain copyright
dc.titleAutomatic extraction of topic hierarchies based on WordNet
dc.typeConference presentation
local.description.notesInaugural Conference of the Australasian Association for Digital Humanities held 27-30 March, 2012. Presentation given by Jamie Norrish
local.publisher.urlhttp://aa-dh.org/conference/
local.type.statusPublished Version
local.contributor.affiliationBrey, Gerhard, King's College London, Department of Digital Humanities
local.contributor.affiliationVieira, Miguel, King's College London, Department of Digital Humanities
dcterms.accessRightsOpen Access
dc.provenanceThe copyright is owned by the authors. The conference organisers make no claim over copyright
CollectionsDigital Humanities Australasia: Building, Mapping, Connecting (2012)

Download

File Description SizeFormat Image
Brey_Automatic2012.pdfPowerpoint presentation slides1.63 MBAdobe PDFThumbnail


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator