An Efficient Document Categorization Approach for Turkish Based Texts

Abstract

Since, it is infeasible to classify all the documents with human effort due to the rapid and uncontrollable growth in textual data, automatic methods have been approached in order to organize the data. Therefore a support vector machine (SVM) classifier is used for text categorization in this study. In text categorization applications, the text representation process could take a huge computation time on weighting the huge size of terms. So far, lexicons that contain less number of terms are used for the solution in the literature. However it has been observed that these kinds of solutions reduce the accuracy of the text classification. In this paper, the term-document matrix is constructed as user dependent according to the purpose of classification. Since the number of terms is still relatively large, we used a hash table for efficient search of terms. Hereby an efficient and rapid TF-IDF method is introduced to construct a weight-matrix to represent the term-document relations and a study concerning classification of the documents in Turkish based news and Turkish columnists is conducted. With the proposed study, the computational time that is required for term-weighting process is reduced substantially; also 99% accuracy is achieved in determination of the news categories and 98% accuracy is achieved in detection of the columnists.

Authors and Affiliations

Sevinç İlhan Omurca*| Kocaeli University, Faculty of Engineering, Computer Engineering Department Umuttepe Campus, Kocaeli – 41380, Turkey, Semih Baş| Tubitak Marmara Research Center Technology Free Zone, IBTECH, Kocaeli – 41470, Turkey, Ekin Ekinci| Kocaeli University, Faculty of Engineering, Computer Engineering Department Umuttepe Campus, Kocaeli – 41380, Turkey

Keywords

Related Articles

An Analysis of Archive Update for Vector Evaluated Particle Swarm Optimization

Multi-objective optimization problem is commonly found in many real world problems. In computational intelligence, Particle Swarm Optimization (PSO) algorithm is a popular method in solving optimization problems. An exte...

The Control of A Non-Linear Chaotic System Using Genetic and Particle Swarm Based On Optimization Algorithms

In this study, the control of a non-linear system was realized by using a linear system control strategy. According to the strategy and by using the controller coefficients, system outputs were controlled for all referen...

Rainfall Runoff Modelling Using Generalized Neural Network and Radial Basis Network

Rainfall runoff study has a wide scope in water resource management. To provide a reliable prediction model is of paramount importance. Runoff prediction is carried out using generalized regression neural network and rad...

Optimal Energy Management System for PV/Wind/Diesel-Battery Power Systems for Rural Health Clinic

Good operation of a hybrid system can be achieved only by a suitable control of the interaction in the operation of the different devices. This paper proposed a supervisory control system that will be used to control and...

Comparative Study of Krill Herd, Firefly and Cuckoo Search Algorithms for Unimodal and Multimodal Optimization

Today, in computer science, a computational challenge exists in finding a globally optimized solution from an enormously large search space. Various metaheuristic methods can be used for finding the solution in a large s...

Download PDF file
  • EP ID EP761
  • DOI -
  • Views 392
  • Downloads 24

How To Cite

Sevinç İlhan Omurca*, Semih Baş, Ekin Ekinci (2015). An Efficient Document Categorization Approach for Turkish Based Texts. International Journal of Intelligent Systems and Applications in Engineering, 3(1), 7-13. https://europub.co.uk/articles/-A-761