Document Similarity Detection using K-Means and Cosine Distance
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2019, Vol 10, Issue 2
Abstract
A two-year study by the Ministry of Research, Technology and Education in Indonesia presented the evaluation of most universities in Indonesia. The findings of the evaluation are the peculiarities of various dissertation softcopies of doctoral students which are similar to any texts available on internet. The suspected plagiarism behavior has a negative effect on both students and faculty members. The main reason behind this behavior is the lack of standardized awareness among faculty members with regard to plagiarism. Therefore, this study proposes a computerized system that is able to detect plagiarism information by using K-means and cosine distance algorithm. The process starts from preprocessing process that includes a novel step of checking Indonesian big dictionary, vector space model design, and the combined calculation of K-means and cosine distance from 17 documents as test data. The result of this study generally shows that the documents have detection accuracy of 93.33%.
Authors and Affiliations
Wendi Usino, Anton Satria Prabuwono, Khalid Hamed S. Allehaibi, Arif Bramantoro, Hasniaty A, Wahyu Amaldi
Design and Implementation of a Communication System and Device Aimed at the Inclusion of People with Oral Communication Disabilities
Disability is part of human condition; it discriminates people who have this complication. The present work was carried out due to this and an experience in our research center. A prototype was designed and build that al...
Anthropomorphic User Interface Feedback in a Sewing Context and Affordances
The aim of the authors' research is to gain better insights into the effectiveness and user satisfaction of anthropomorphism at the user interface. Therefore, this paper presents a between users experiment and the...
Fast Vertical Mining Using Boolean Algebra
The vertical association rules mining algorithm is an efficient mining method, which makes use of support sets of frequent itemsets to calculate the support of candidate itemsets. It overcomes the disadvantage of scannin...
Integrated Framework to Study Efficient Spectral Estimation Techniques for Assessing Spectral Efficiency Analysis
The advanced network applications enable software driven spectral analysis of non-stationary signal or processes which precisely involves domain analysis with the purpose of decomposing a complex signal coefficients into...
Analysis of MIMO Systems used in planning a 4G-WiMAX Network in Ghana
With the increasing demand for mobile data services, Broadband Wireless Access (BWA) is emerging as one of the fastest growing areas within mobile communications. Innovative wireless communication systems, such as WiMAX,...