An Efficient Document Categorization Approach for Turkish Based Texts

Abstract

Since, it is infeasible to classify all the documents with human effort due to the rapid and uncontrollable growth in textual data, automatic methods have been approached in order to organize the data. Therefore a support vector machine (SVM) classifier is used for text categorization in this study. In text categorization applications, the text representation process could take a huge computation time on weighting the huge size of terms. So far, lexicons that contain less number of terms are used for the solution in the literature. However it has been observed that these kinds of solutions reduce the accuracy of the text classification. In this paper, the term-document matrix is constructed as user dependent according to the purpose of classification. Since the number of terms is still relatively large, we used a hash table for efficient search of terms. Hereby an efficient and rapid TF-IDF method is introduced to construct a weight-matrix to represent the term-document relations and a study concerning classification of the documents in Turkish based news and Turkish columnists is conducted. With the proposed study, the computational time that is required for term-weighting process is reduced substantially; also 99% accuracy is achieved in determination of the news categories and 98% accuracy is achieved in detection of the columnists.

Authors and Affiliations

Sevinç İlhan Omurca*| Kocaeli University, Faculty of Engineering, Computer Engineering Department Umuttepe Campus, Kocaeli – 41380, Turkey, Semih Baş| Tubitak Marmara Research Center Technology Free Zone, IBTECH, Kocaeli – 41470, Turkey, Ekin Ekinci| Kocaeli University, Faculty of Engineering, Computer Engineering Department Umuttepe Campus, Kocaeli – 41380, Turkey

Keywords

Related Articles

A simple Mathematical Fuzzy Model of Brain Emotional Learning to Predict Kp Geomagnetic Index

In this paper, we propose fuzzy mathematical model of brain limbic system (LS) which is responsible for emotional stimuli. Here the proposed model is utilized to predict the chaotic activity of the earth’s magnetosphere....

The Classification of Eye State by Using kNN and MLP Classification Models According to the EEG Signals

What is widely used for classification of eye state to detect human’s cognition state is electroencephalography (EEG). In this study, the usage of EEG signals for online eye state detection method was proposed. In this s...

An Efficient Approach for Ground Echoes Suppression Based on Textural Features and SVM

The use of the Support Vector Machine (SVM) technique for the clutter identification in the context of meteorological data is presented. The clutter is due to ground echoes and anomalous propagation. The SVM is combined...

Dependability Assessment of the Railway Signalling Systems Based on the Stochastic Petri Nets Analysis

In this article, we propose a methodology to evaluate the performances of the railway signalling systems in terms of the availability. Firstly, level crossings in Morocco are presented. Secondly, a railway signalling sys...

Power System Contingency Ranking using Fuzzy Logic Based Approach

Voltage stability is a major concern in planning and operations of power systems. It is well known that voltage instability and collapse have led to major system failures. Modern transmission networks are more heavily lo...

Download PDF file
  • EP ID EP761
  • DOI -
  • Views 421
  • Downloads 24

How To Cite

Sevinç İlhan Omurca*, Semih Baş, Ekin Ekinci (2015). An Efficient Document Categorization Approach for Turkish Based Texts. International Journal of Intelligent Systems and Applications in Engineering, 3(1), 7-13. https://europub.co.uk/articles/-A-761