Document Similarity Detection using K-Means and Cosine Distance

Abstract

A two-year study by the Ministry of Research, Technology and Education in Indonesia presented the evaluation of most universities in Indonesia. The findings of the evaluation are the peculiarities of various dissertation softcopies of doctoral students which are similar to any texts available on internet. The suspected plagiarism behavior has a negative effect on both students and faculty members. The main reason behind this behavior is the lack of standardized awareness among faculty members with regard to plagiarism. Therefore, this study proposes a computerized system that is able to detect plagiarism information by using K-means and cosine distance algorithm. The process starts from preprocessing process that includes a novel step of checking Indonesian big dictionary, vector space model design, and the combined calculation of K-means and cosine distance from 17 documents as test data. The result of this study generally shows that the documents have detection accuracy of 93.33%.

Authors and Affiliations

Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi

Keywords

Related Articles

Design and Implementation of a Communication System and Device Aimed at the Inclusion of People with Oral Communication Disabilities

Disability is part of human condition; it discriminates people who have this complication. The present work was carried out due to this and an experience in our research center. A prototype was designed and build that al...

 Anthropomorphic User Interface Feedback in a Sewing Context and Affordances

 The aim of the authors' research is to gain better insights into the effectiveness and user satisfaction of anthropomorphism at the user interface. Therefore, this paper presents a between users experiment and the...

Fast Vertical Mining Using Boolean Algebra

The vertical association rules mining algorithm is an efficient mining method, which makes use of support sets of frequent itemsets to calculate the support of candidate itemsets. It overcomes the disadvantage of scannin...

Integrated Framework to Study Efficient Spectral Estimation Techniques for Assessing Spectral Efficiency Analysis

The advanced network applications enable software driven spectral analysis of non-stationary signal or processes which precisely involves domain analysis with the purpose of decomposing a complex signal coefficients into...

Analysis of MIMO Systems used in planning a 4G-WiMAX Network in Ghana

With the increasing demand for mobile data services, Broadband Wireless Access (BWA) is emerging as one of the fastest growing areas within mobile communications. Innovative wireless communication systems, such as WiMAX,...

Download PDF file
  • EP ID EP468311
  • DOI 10.14569/IJACSA.2019.0100222
  • Views 62
  • Downloads 0

How To Cite

Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi (2019). Document Similarity Detection using K-Means and Cosine Distance. International Journal of Advanced Computer Science & Applications, 10(2), 165-170. https://europub.co.uk/articles/-A-468311