Context-Aware Detection of Sneaky Vandalism on Wikipedia Across Multiple Languages

Tran, Khoi-Nguyen; Christen, Peter; Sanner, Scott; Xie, Lexing

Context-Aware Detection of Sneaky Vandalism on Wikipedia Across Multiple Languages

Date

2015

Authors

Tran, Khoi-Nguyen

Christen, Peter

Sanner, Scott

Xie, Lexing

Publisher

Springer International Publishing AG

Abstract

The malicious modification of articles, termed vandalism, is a serious problem for open access encyclopedias such as Wikipedia. Wikipedia’s counter-vandalism bots and past vandalism detection research have greatly reduced the exposure and damage of common and obvious types of vandalism. However, there remains increasingly more sneaky types of vandalism that are clearly out of context of the sentence or article. In this paper, we propose a novel context-aware and cross-language vandalism detection technique that scales to the size of the full Wikipedia and extends the types of vandalism detectable beyond past feature-based approaches. Our technique uses word dependencies to identify vandal words in sentences by combining part-of-speech tagging with a conditional random fields classifier. We evaluate our technique on two Wikipedia data sets: the PAN data sets with over 62, 000 edits, commonly used by related research; and our own vandalism repairs data sets with over 500 million edits of over 9 million articles from five languages. As a comparison, we implement a feature-based classifier to analyse the quality of each classification technique and the trade-offs of each type of classifier. Our results show how context-aware detection techniques can become a new counter-vandalism tool for Wikipedia that complements current feature-based techniques.

URI

http://hdl.handle.net/1885/103768

Collections

ANU Research Publications

Source

Efficient Interactive Training Selection for Large-Scale Entity Resolution

Type

Conference paper

DOI

10.1007/978-3-319-18038-0_30

Restricted until

2037-12-31

Downloads

File

Description

01_Tran_Context-Aware_Detection_of_2015.pdf (524.49 KB)

Full item page

Cultural advice

Context-Aware Detection of Sneaky Vandalism on Wikipedia Across Multiple Languages

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Source

Type

Book Title

Entity type

Access Statement

License Rights

DOI

Restricted until

Downloads