Document Similarity Detection using K-Means and Cosine Distance
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2019, Vol 10, Issue 2
Abstract
A two-year study by the Ministry of Research, Technology and Education in Indonesia presented the evaluation of most universities in Indonesia. The findings of the evaluation are the peculiarities of various dissertation softcopies of doctoral students which are similar to any texts available on internet. The suspected plagiarism behavior has a negative effect on both students and faculty members. The main reason behind this behavior is the lack of standardized awareness among faculty members with regard to plagiarism. Therefore, this study proposes a computerized system that is able to detect plagiarism information by using K-means and cosine distance algorithm. The process starts from preprocessing process that includes a novel step of checking Indonesian big dictionary, vector space model design, and the combined calculation of K-means and cosine distance from 17 documents as test data. The result of this study generally shows that the documents have detection accuracy of 93.33%.
Authors and Affiliations
Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi
Opinion Mining and thought Pattern Classification with Natural Language Processing (NLP) Tools
Opinion mining from digital media is becoming the easiest way to obtain trivial aspects of the thinking trends. Currently, there exists no hard and fast modeling or classification over this for any society or global comm...
A Novel Modeling based Agent Cellular Automata for Advanced Residential Mobility Applications
Nowadays, residential mobility (RM) is usually interconnected with other urban phenomena to give more realistic and effective to the simulation models in order to support urban planners and decision makers. Recent RM res...
A Study on Usability Awareness in Local IT Industry
Usability awareness receives more consideration by industry professionals and researchers throughout the world, but it is limited in Pakistan. This study reports survey results of the current state of usability awareness...
A Comprehensive IoT Attacks Survey based on a Building-blocked Reference Model
Internet of Things (IoT) has not yet reached a distinctive definition. A generic understanding of IoT is that it offers numerous services in many domains, utilizing conventional internet infrastructure by enabling differ...
Effective Methods to Improve the Educational Process of Medicine in Bulgaria
The introduction of modern technologies into the educational process of medical students is a challenge of the new era in education, which can increase the success of students and give them confidence in their capabiliti...