Document Similarity Detection using K-Means and Cosine Distance

Abstract

A two-year study by the Ministry of Research, Technology and Education in Indonesia presented the evaluation of most universities in Indonesia. The findings of the evaluation are the peculiarities of various dissertation softcopies of doctoral students which are similar to any texts available on internet. The suspected plagiarism behavior has a negative effect on both students and faculty members. The main reason behind this behavior is the lack of standardized awareness among faculty members with regard to plagiarism. Therefore, this study proposes a computerized system that is able to detect plagiarism information by using K-means and cosine distance algorithm. The process starts from preprocessing process that includes a novel step of checking Indonesian big dictionary, vector space model design, and the combined calculation of K-means and cosine distance from 17 documents as test data. The result of this study generally shows that the documents have detection accuracy of 93.33%.

Authors and Affiliations

Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi

Keywords

Related Articles

Opinion Mining and thought Pattern Classification with Natural Language Processing (NLP) Tools

Opinion mining from digital media is becoming the easiest way to obtain trivial aspects of the thinking trends. Currently, there exists no hard and fast modeling or classification over this for any society or global comm...

A Novel Modeling based Agent Cellular Automata for Advanced Residential Mobility Applications

Nowadays, residential mobility (RM) is usually interconnected with other urban phenomena to give more realistic and effective to the simulation models in order to support urban planners and decision makers. Recent RM res...

A Study on Usability Awareness in Local IT Industry

Usability awareness receives more consideration by industry professionals and researchers throughout the world, but it is limited in Pakistan. This study reports survey results of the current state of usability awareness...

A Comprehensive IoT Attacks Survey based on a Building-blocked Reference Model

Internet of Things (IoT) has not yet reached a distinctive definition. A generic understanding of IoT is that it offers numerous services in many domains, utilizing conventional internet infrastructure by enabling differ...

Effective Methods to Improve the Educational Process of Medicine in Bulgaria

The introduction of modern technologies into the educational process of medical students is a challenge of the new era in education, which can increase the success of students and give them confidence in their capabiliti...

Download PDF file
  • EP ID EP468311
  • DOI 10.14569/IJACSA.2019.0100222
  • Views 97
  • Downloads 0

How To Cite

Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi (2019). Document Similarity Detection using K-Means and Cosine Distance. International Journal of Advanced Computer Science & Applications, 10(2), 165-170. https://europub.co.uk/articles/-A-468311