Harnessing Context for Vandalism Detection in Wikipedia
Journal Title: EAI Endorsed Transactions on Collaborative Computing - Year 2015, Vol 1, Issue 1
Abstract
The importance of collaborative social media (CSM) applications such as Wikipedia to modern free societies can hardly be overemphasized. By allowing end users to freely create and edit content, Wikipedia has greatly facilitated democratization of information. However, over the past several years, Wikipedia has also become susceptible to vandalism, which has adversely affected its information quality. Traditional vandalism detection techniques that rely upon simple textual features such as spammy or abusive words have not been very effective in combating sophisticated vandal attacks that do not contain common vandalism markers. In this paper, we propose a context-based vandalism detection framework for Wikipedia. We first propose a contextenhanced finite state model for representing the context evolution ofWikipedia articles. This paper identifies two distinct types of context that are potentially valuable for vandalism detection, namely content-context and contributor-context. The distinguishing powers of these contexts are discussed by providing empirical results. We design two novel metrics for measuring how well the content-context of an incoming edit fits into the topic and the existing content of a Wikipedia article. We outline machine learning-based vandalism identification schemes that utilize these metrics. Our experiments indicate that utilizing context can substantially improve vandalism detection accuracy.
Authors and Affiliations
Lakshmish Ramaswamy, Raga Sowmya Tummalapenta, Deepika Sethi, Kang Li, Calton Pu
A Tuple Space for Data Sharing in Robot Swarms
In this paper, we present a system to allow a swarm of robots to agree on a set of (key,value) pairs. This system enables a form of information sharing that has the potential to be an asset for coordination in complex en...
An Analytical Study of Computation and Communication Tradeoffs in Distributed Graph
Distributed vertex-centric graph processing systems such as Pregel, Giraph and GPS have acquired significant popularity in recent years. Although the manner in which graph data is partitioned and placed on the computatio...
TinCan: User-Defined P2P Virtual Network Overlays for Ad-hoc Collaboration
Virtual private networking (VPN) has become an increasingly important component of a collaboration environment because it ensures private, authenticated communication among participants, using existing collaboration tool...
A Scheme for Collaboratively Processing Nearest Neighbor Queries in Oblivious Storage
Security concerns are a substantial impediment to the wider deployment of cloud storage. There are two main concerns on the confidentiality of outsourced data: i) protecting the data, and ii) protecting the access patter...
Optimistic Scheduling: facilitating the collaboration by prioritizing the individual needs
The collaboration among people is one of the key factors for the optimization of many processes and activities. The efficiency and the effectiveness of the collaboration has an intrinsic value which significantly affects...