An Efficient Document Categorization Approach for Turkish Based Texts
Journal Title: International Journal of Intelligent Systems and Applications in Engineering - Year 2015, Vol 3, Issue 1
Abstract
Since, it is infeasible to classify all the documents with human effort due to the rapid and uncontrollable growth in textual data, automatic methods have been approached in order to organize the data. Therefore a support vector machine (SVM) classifier is used for text categorization in this study. In text categorization applications, the text representation process could take a huge computation time on weighting the huge size of terms. So far, lexicons that contain less number of terms are used for the solution in the literature. However it has been observed that these kinds of solutions reduce the accuracy of the text classification. In this paper, the term-document matrix is constructed as user dependent according to the purpose of classification. Since the number of terms is still relatively large, we used a hash table for efficient search of terms. Hereby an efficient and rapid TF-IDF method is introduced to construct a weight-matrix to represent the term-document relations and a study concerning classification of the documents in Turkish based news and Turkish columnists is conducted. With the proposed study, the computational time that is required for term-weighting process is reduced substantially; also 99% accuracy is achieved in determination of the news categories and 98% accuracy is achieved in detection of the columnists.
Authors and Affiliations
Sevinç İlhan Omurca*| Kocaeli University, Faculty of Engineering, Computer Engineering Department Umuttepe Campus, Kocaeli – 41380, Turkey, Semih Baş| Tubitak Marmara Research Center Technology Free Zone, IBTECH, Kocaeli – 41470, Turkey, Ekin Ekinci| Kocaeli University, Faculty of Engineering, Computer Engineering Department Umuttepe Campus, Kocaeli – 41380, Turkey
Epileptic State Detection: Pre-ictal, Inter-ictal, Ictal
Epileptic seizure detection and prediction from electroencephalography (EEG) is a vital area of research. In this study, Second-Order Difference Plot (SODP) is used to extract features based on consecutive difference of...
The Control of A Non-Linear Chaotic System Using Genetic and Particle Swarm Based On Optimization Algorithms
In this study, the control of a non-linear system was realized by using a linear system control strategy. According to the strategy and by using the controller coefficients, system outputs were controlled for all referen...
Design and Implementation of High Speed Artificial Neural Network Based Sprott 94 S System on FPGA
FPGA-based embedding system designs have been preferred for industrial applications and prototyping because of the advantages of parallel processing, reconfigurability and low cost. Due to having characteristic structure...
Process modelling and simulation of a Simple Water Treatment Plant
Water treatment plants are likely to experience problems such as the water level both in the filter cells and in the tanks tend to fluctuate widely. These create the potential for partial drainage, overflow, and potentia...
An Efficient Approach for Ground Echoes Suppression Based on Textural Features and SVM
The use of the Support Vector Machine (SVM) technique for the clutter identification in the context of meteorological data is presented. The clutter is due to ground echoes and anomalous propagation. The SVM is combined...