Document Similarity Detection using K-Means and Cosine Distance

Abstract

A two-year study by the Ministry of Research, Technology and Education in Indonesia presented the evaluation of most universities in Indonesia. The findings of the evaluation are the peculiarities of various dissertation softcopies of doctoral students which are similar to any texts available on internet. The suspected plagiarism behavior has a negative effect on both students and faculty members. The main reason behind this behavior is the lack of standardized awareness among faculty members with regard to plagiarism. Therefore, this study proposes a computerized system that is able to detect plagiarism information by using K-means and cosine distance algorithm. The process starts from preprocessing process that includes a novel step of checking Indonesian big dictionary, vector space model design, and the combined calculation of K-means and cosine distance from 17 documents as test data. The result of this study generally shows that the documents have detection accuracy of 93.33%.

Authors and Affiliations

Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi

Keywords

Related Articles

Image noise Detection and Removal based on Enhanced GridLOF Algorithm

Image noise removal is a major task in image processing where noise can harness any information inferred from the image especially when the noise level is high. Although there exists many outlier detection approaches use...

Effectiveness of Iphone’s Touch ID: KSA Case Study

A new trend of incorporating Touch ID sensors in mobile devices is appearing. Last year, Apple released a new model of its famous iPhone (5s). One of the most anticipated and hailed features of the new device was its Tou...

An Ecn Approach to Congestion Control Mechanisms in Mobile Adhoc Networks

Node(s)/link(s) of a network are subjected to overloading; network performance deteriorates substantially due to network congestion. Network congestion can be mitigated with the help of Explicit Congestion notification (...

Automatic Image Registration Technique of Remote Sensing Images

Image registration is a crucial step in most image processing tasks for which the final result is achieved from a combination of various resources. Automatic registration of remote-sensing images is a difficult task as i...

EEG Signals based Brain Source Localization Approaches

This article is focused on the overview of functionality of the neurons and investigation of the current research and algorithms used for brain source localization. The human brain is made up of active neurons and contin...

Download PDF file
  • EP ID EP468311
  • DOI 10.14569/IJACSA.2019.0100222
  • Views 81
  • Downloads 0

How To Cite

Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi (2019). Document Similarity Detection using K-Means and Cosine Distance. International Journal of Advanced Computer Science & Applications, 10(2), 165-170. https://europub.co.uk/articles/-A-468311