Skip navigation
Skip navigation

HDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages

Zhao, Xuejiao; Xing, Zhenchang; Kabir, Muhammad Ashad; Sawada, Naoya; Li, Jing; Lin, Shang-Wei

Description

Knowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subject, verb phrase, object) is one of the important techniques for domain specific knowledge graph construction. In this research, an automatic method named HDSKG is proposed to discover domain specific...[Show more]

dc.contributor.authorZhao, Xuejiao
dc.contributor.authorXing, Zhenchang
dc.contributor.authorKabir, Muhammad Ashad
dc.contributor.authorSawada, Naoya
dc.contributor.authorLi, Jing
dc.contributor.authorLin, Shang-Wei
dc.contributor.editorBavota, G.
dc.contributor.editorPinzger, M.
dc.contributor.editorMarcus, A.
dc.coverage.spatialKlagenfurt, Austria
dc.date.accessioned2021-07-01T01:46:13Z
dc.date.createdFebruary 20-24 2017
dc.identifier.isbn9781509055012
dc.identifier.urihttp://hdl.handle.net/1885/238483
dc.description.abstractKnowledge graph is useful for many different domains like search result ranking, recommendation, exploratory search, etc. It integrates structural information of concepts across multiple information sources, and links these concepts together. The extraction of domain specific relation triples (subject, verb phrase, object) is one of the important techniques for domain specific knowledge graph construction. In this research, an automatic method named HDSKG is proposed to discover domain specific concepts and their relation triples from the content of webpages. We incorporate the dependency parser with rule-based method to chunk the relations triple candidates, then we extract advanced features of these candidate relation triples to estimate the domain relevance by a machine learning algorithm. For the evaluation of our method, we apply HDSKG to Stack Overflow (a Q&A website about computer programming). As a result, we construct a knowledge graph of software engineering domain with 35279 relation triples, 44800 concepts, and 9660 unique verb phrases. The experimental results show that both the precision and recall of HDSKG (0.78 and 0.7 respectively) is much higher than the openIE (0.11 and 0.6 respectively). The performance is particularly efficient in the case of complex sentences. Further more, with the self-training technique we used in the classifier, HDSKG can be applied to other domain easily with less training data.
dc.format.mimetypeapplication/pdf
dc.language.isoen_AU
dc.publisherIEEE
dc.relation.ispartofseries24th IEEE International Conference on Software Analysis, Evolution, and Reengineering, SANER 2017
dc.rights© 2017 IEEE
dc.sourceSANER 2017 - 24th IEEE International Conference on Software Analysis, Evolution, and Reengineering
dc.source.urihttps://ieeexplore.ieee.org/document/7884609
dc.subjectKnowledge Graph
dc.subjectStructural Information Extraction
dc.subjectopenIE
dc.subjectStack Overflow
dc.subjectDependency Parse
dc.titleHDSKG: Harvesting Domain Specific Knowledge Graph from Content of Webpages
dc.typeConference paper
local.description.notesImported from ARIES
local.description.refereedYes
dc.date.issued2017
local.identifier.absfor080199 - Artificial Intelligence and Image Processing not elsewhere classified
local.identifier.ariespublicationa383154xPUB5894
local.publisher.urlhttps://ieeexplore.ieee.org
local.type.statusPublished Version
local.contributor.affiliationZhao, Xuejiao, Nanyang Technological University
local.contributor.affiliationXing, Zhenchang, College of Engineering and Computer Science, ANU
local.contributor.affiliationKabir, Muhammad Ashad, Charles Sturt University
local.contributor.affiliationSawada, Naoya, NTT Communications Corporation
local.contributor.affiliationLi, Jing, Nanyang Technological University
local.contributor.affiliationLin, Shang-Wei, Nanyang Technological University
local.description.embargo2099-12-31
local.bibliographicCitation.startpage56
local.bibliographicCitation.lastpage67
local.identifier.doi10.1109/SANER.2017.7884609
dc.date.updated2020-11-23T10:36:51Z
local.identifier.scopusID2-s2.0-85018390307
CollectionsANU Research Publications

Download

File Description SizeFormat Image
01_Zhao_HDSKG%3A_Harvesting_Domain_2017.pdf343.94 kBAdobe PDF    Request a copy


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  17 November 2022/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator