Skip navigation
Skip navigation

Unsupervised software-specific morphological forms inference from informal discussions

Chen, Chunyang; Xing, Zhenchang; Wang, Ximing

Description

Informal discussions on social platforms (e.g., Stack Overflow) accumulates a large body of programming knowledge in natural language text. Natural language process (NLP) techniques can be exploited to harvest this knowledge base for software engineering tasks. To make an effective use of NLP techniques, consistent vocabulary is essential. Unfortunately, the same concepts are often intentionally or accidentally mentioned in many different morphological forms in informal discussions, such as...[Show more]

dc.contributor.authorChen, Chunyang
dc.contributor.authorXing, Zhenchang
dc.contributor.authorWang, Ximing
dc.coverage.spatialBuenos Aires, Argentina
dc.date.accessioned2020-12-20T20:58:26Z
dc.date.available2020-12-20T20:58:26Z
dc.date.createdMay 20-28 2017
dc.identifier.isbn9781538638682
dc.identifier.urihttp://hdl.handle.net/1885/218588
dc.description.abstractInformal discussions on social platforms (e.g., Stack Overflow) accumulates a large body of programming knowledge in natural language text. Natural language process (NLP) techniques can be exploited to harvest this knowledge base for software engineering tasks. To make an effective use of NLP techniques, consistent vocabulary is essential. Unfortunately, the same concepts are often intentionally or accidentally mentioned in many different morphological forms in informal discussions, such as abbreviations, synonyms and misspellings. Existing techniques to deal with such morphological forms are either designed for general English or predominantly rely on domain-specific lexical rules. A thesaurus of software-specific terms and commonly-used morphological forms is desirable for normalizing software engineering text, but very difficult to build manually. In this work, we propose an automatic approach to build such a thesaurus. Our approach identifies software-specific terms by contrasting software-specific and general corpuses, and infers morphological forms of software-specific terms by combining distributed word semantics, domain-specific lexical rules and transformations, and graph analysis of morphological relations. We evaluate the coverage and accuracy of the resulting thesaurus against community-curated lists of software-specific terms, abbreviations and synonyms. We also manually examine the correctness of the identified abbreviations and synonyms in our thesaurus. We demonstrate the usefulness of our thesaurus in a case study of normalizing questions from Stack Overflow and CodeProject.
dc.format.mimetypeapplication/pdf
dc.language.isoen_AU
dc.publisherInstitute of Electrical and Electronics Engineers (IEEE Inc)
dc.relation.ispartofseries39th IEEE/ACM International Conference on Software Engineering, ICSE 2017
dc.sourceProceedings - 2017 IEEE/ACM 39th International Conference on Software Engineering, ICSE 2017
dc.source.urihttp://ieeexplore.ieee.org/xpl/mostRecentIssue.jsp?punumber=7976701
dc.titleUnsupervised software-specific morphological forms inference from informal discussions
dc.typeConference paper
local.description.notesImported from ARIES
local.description.refereedYes
dc.date.issued2017
local.identifier.absfor080399 - Computer Software not elsewhere classified
local.identifier.ariespublicationa383154xPUB8153
local.type.statusPublished Version
local.contributor.affiliationChen, Chunyang, Nanyang Technological University
local.contributor.affiliationXing, Zhenchang, College of Engineering and Computer Science, ANU
local.contributor.affiliationWang, Ximing, Nanyang Technological University
local.bibliographicCitation.startpage450
local.bibliographicCitation.lastpage461
local.identifier.doi10.1109/ICSE.2017.48
dc.date.updated2020-11-23T11:25:15Z
local.identifier.scopusID2-s2.0-85027696498
CollectionsANU Research Publications

Download

There are no files associated with this item.


Items in Open Research are protected by copyright, with all rights reserved, unless otherwise indicated.

Updated:  19 May 2020/ Responsible Officer:  University Librarian/ Page Contact:  Library Systems & Web Coordinator