An Efficient Document Categorization Approach for Turkish Based Texts

Abstract

Since, it is infeasible to classify all the documents with human effort due to the rapid and uncontrollable growth in textual data, automatic methods have been approached in order to organize the data. Therefore a support vector machine (SVM) classifier is used for text categorization in this study. In text categorization applications, the text representation process could take a huge computation time on weighting the huge size of terms. So far, lexicons that contain less number of terms are used for the solution in the literature. However it has been observed that these kinds of solutions reduce the accuracy of the text classification. In this paper, the term-document matrix is constructed as user dependent according to the purpose of classification. Since the number of terms is still relatively large, we used a hash table for efficient search of terms. Hereby an efficient and rapid TF-IDF method is introduced to construct a weight-matrix to represent the term-document relations and a study concerning classification of the documents in Turkish based news and Turkish columnists is conducted. With the proposed study, the computational time that is required for term-weighting process is reduced substantially; also 99% accuracy is achieved in determination of the news categories and 98% accuracy is achieved in detection of the columnists.

Authors and Affiliations

Sevinç İlhan Omurca*| Kocaeli University, Faculty of Engineering, Computer Engineering Department Umuttepe Campus, Kocaeli – 41380, Turkey, Semih Baş| Tubitak Marmara Research Center Technology Free Zone, IBTECH, Kocaeli – 41470, Turkey, Ekin Ekinci| Kocaeli University, Faculty of Engineering, Computer Engineering Department Umuttepe Campus, Kocaeli – 41380, Turkey

Keywords

Related Articles

Classification of Neurodegenerative Diseases using Machine Learning Methods

In this study, neurodegenerative diseases (Amyotrophic Lateral Sclerosis, Huntington’s disease, and Parkinson’s disease) were diagnosed and classified using force signals. In the classification, five machine learning al...

A fuzzy approach for determination of prostate cancer

Goal of this study is a design of a fuzzy expert system, its application aspects in the medicine area and its introduction for calculation of numeric value of prostate cancer risk. For this aim it was used prostate speci...

A Bee Colony Optimization-based Approach for Binary Optimization

The bee colony optimization (BCO) algorithm, one of the swarm intelligence algorithms, is a population based iterative search algorithm. Being inspired by collective bee intelligence, BCO has been proposed for solving di...

The Usage of Artificial Neural Networks Method in the Diagnosis of Rheumatoid Arthritis

In this study, artificial neural networks (ANN) method is used for the diagnosis of rheumatoid arthritis in order to support medical diagnostics. For the diagnosis of rheumatoid arthritis, backpropagation algorithm was e...

Validation of Registration for Renal Dynamic Contrast Enhanced MRI Imaging

In Dynamic Contrast Enhanced Resonance Imaging (DCE-MRI), abdomen is scanned repeatedly and rapidly after injection of a contrast agent. During data acquisition, collected images suffer from the motion induced by the pat...

Download PDF file
  • EP ID EP761
  • DOI -
  • Views 449
  • Downloads 24

How To Cite

Sevinç İlhan Omurca*, Semih Baş, Ekin Ekinci (2015). An Efficient Document Categorization Approach for Turkish Based Texts. International Journal of Intelligent Systems and Applications in Engineering, 3(1), 7-13. https://europub.co.uk/articles/-A-761