Document Similarity Detection using K-Means and Cosine Distance

Abstract

A two-year study by the Ministry of Research, Technology and Education in Indonesia presented the evaluation of most universities in Indonesia. The findings of the evaluation are the peculiarities of various dissertation softcopies of doctoral students which are similar to any texts available on internet. The suspected plagiarism behavior has a negative effect on both students and faculty members. The main reason behind this behavior is the lack of standardized awareness among faculty members with regard to plagiarism. Therefore, this study proposes a computerized system that is able to detect plagiarism information by using K-means and cosine distance algorithm. The process starts from preprocessing process that includes a novel step of checking Indonesian big dictionary, vector space model design, and the combined calculation of K-means and cosine distance from 17 documents as test data. The result of this study generally shows that the documents have detection accuracy of 93.33%.

Authors and Affiliations

Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi

Keywords

Related Articles

Flood Analysis in Peru using Satellite Image: The Summer 2017 Case

At the beginning of the year 2017, different regions of Peru suffered from heavy rains mainly due to the 'El Niño' and 'La Niña' phenomena. As a result of these massive storms, several cities were affected by overflows a...

Estimation of Dynamic Background and Object Detection in Noisy Visual Surveillance

Dynamic background subtraction in noisy environment for detecting object is a challenging process in computer vision. The proposed algorithm has been used to identify moving objects from the sequence of video frames whic...

Object Conveyance Algorithm for Multiple Mobile Robots based on Object Shape and Size

This paper describes a determination method of a number of a team for multiple mobile robot object conveyance. The number of robot on multiple mobile robot systems is the factor of complexity on robots formation and moti...

Ontology-based Change Propagation in Shareable Health Information Applications

One of the most important challenges to be ad-dressed when establishing an integrated smart health environ-ment is the availability of shareable health data and knowledge which standardize the interoperability of compone...

Secure Steganography for Digital Images

The degree of imperceptibility of hidden image in the ‘Digital Image Steganography’ is mostly defined in relation to the limitation of Human Visual System (HVS), its chances of detection using statistical methods and its...

Download PDF file
  • EP ID EP468311
  • DOI 10.14569/IJACSA.2019.0100222
  • Views 98
  • Downloads 0

How To Cite

Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi (2019). Document Similarity Detection using K-Means and Cosine Distance. International Journal of Advanced Computer Science & Applications, 10(2), 165-170. https://europub.co.uk/articles/-A-468311