An Efficient Document Categorization Approach for Turkish Based Texts

Abstract

Since, it is infeasible to classify all the documents with human effort due to the rapid and uncontrollable growth in textual data, automatic methods have been approached in order to organize the data. Therefore a support vector machine (SVM) classifier is used for text categorization in this study. In text categorization applications, the text representation process could take a huge computation time on weighting the huge size of terms. So far, lexicons that contain less number of terms are used for the solution in the literature. However it has been observed that these kinds of solutions reduce the accuracy of the text classification. In this paper, the term-document matrix is constructed as user dependent according to the purpose of classification. Since the number of terms is still relatively large, we used a hash table for efficient search of terms. Hereby an efficient and rapid TF-IDF method is introduced to construct a weight-matrix to represent the term-document relations and a study concerning classification of the documents in Turkish based news and Turkish columnists is conducted. With the proposed study, the computational time that is required for term-weighting process is reduced substantially; also 99% accuracy is achieved in determination of the news categories and 98% accuracy is achieved in detection of the columnists.

Authors and Affiliations

Sevinç İlhan Omurca*| Kocaeli University, Faculty of Engineering, Computer Engineering Department Umuttepe Campus, Kocaeli – 41380, Turkey, Semih Baş| Tubitak Marmara Research Center Technology Free Zone, IBTECH, Kocaeli – 41470, Turkey, Ekin Ekinci| Kocaeli University, Faculty of Engineering, Computer Engineering Department Umuttepe Campus, Kocaeli – 41380, Turkey

Keywords

Related Articles

Neural Boundary Conditions in Optic Guides

In this study, the boundary coefficients of Transverse Electric (TE) and Transverse Magnetic (TM) modes at a planar slab optic guides are modeled by Neural Networks (NN). After modal analysis, train and test files are pr...

Preferences, Utility and Prescriptive Decision Control in Complex Systems

The evaluation of the preferences based utility function is a goal of the human cantered control (management) design.The achievement of this goal depends on the determination and on the presentation of the requirements,...

Solution for the Travelling Salesman Problem with a Microcontrollerbased Instantaneous System

The travelling salesman problem (TSP) is one of the most frequently researched combinational optimization problems. Despite its trivial definition, the problem is very difficult to solve. Therefore, it is categorized as...

Fuzzy Multicriterial Methods for the Selection of IT-Professionals

This paper presents the solution of issues related to selection based on evaluation of demand set forth to IT specialists, to develop appropriate decision support system. In this case problem is reduced to multicriterial...

A Mitigation Technique for Inrush Currents in Load Transformers for the Series Voltage Sag Compensator

In many countries, high-tech manufacturers concentrate in industry parks. Survey results suggest that 92% of interruption at industrial facilities is voltage sag related. An inrush mitigation technique is proposed and im...

Download PDF file
  • EP ID EP761
  • DOI -
  • Views 463
  • Downloads 24

How To Cite

Sevinç İlhan Omurca*, Semih Baş, Ekin Ekinci (2015). An Efficient Document Categorization Approach for Turkish Based Texts. International Journal of Intelligent Systems and Applications in Engineering, 3(1), 7-13. https://europub.co.uk/articles/-A-761