Urdu Text Classification using Majority Voting

Abstract

Text classification is a tool to assign the predefined categories to the text documents using supervised machine learning algorithms. It has various practical applications like spam detection, sentiment detection, and detection of a natural language. Based on the idea we applied five well-known classification techniques on Urdu language corpus and assigned a class to the documents using majority voting. The corpus contains 21769 news documents of seven categories (Business, Entertainment, Culture, Health, Sports, and Weird). The algorithms were not able to work directly on the data, so we applied the preprocessing techniques like tokenization, stop words removal and a rule-based stemmer. After preprocessing 93400 features are extracted from the data to apply machine learning algorithms. Furthermore, we achieved up to 94% precision and recall using majority voting.

Authors and Affiliations

Muhammad Usman, Zunaira Shafique, Saba Ayub, Kamran Malik

Keywords

Related Articles

 Novel Techniques for Fair Rate Control in Wireless Mesh Networks

 IEEE 802.11 based wireless mesh networks can exhibit severe fairness problem by distributing throughput among different flows originated from different nodes. Congestion control, Throughput, Fairness are the import...

Online Estimation of Wind Turbine Tip Speed Ratio by Adaptive Neuro-Fuzzy Algorithm

The efficiency of a wind turbine highly depends on the value of tip speed ratio during its operation. The power coefficient of a wind turbine varies with tip speed ratio. For maximum power extraction, it is very importan...

A Two-Stage Classifier Approach using RepTree Algorithm for Network Intrusion Detection

In this paper, we present a two-stage classifier based on RepTree algorithm and protocols subset for network intrusion detection system. To evaluate the performance of our approach, we used the UNSW-NB15 data set and the...

Aquabot: A Diagnostic Chatbot for Achluophobia and Autism

Chatbots or chatter bots have been a good way to entertain one. This paper emphasizes on the use of a chatbot in the diagnosis of Achluophobia – the fear of darkness and autism disorder. Autism and Achluophobia (fear of...

 Wideband Wireless Access Systems Interference Robustness: Its Effect on Quality of Video Streaming

  A necessary requirement incumbent on any information communication system and/or network is the capacity to transmit information with a predefined degree of accuracy in the presence of inevitable interference. The...

Download PDF file
  • EP ID EP133714
  • DOI 10.14569/IJACSA.2016.070836
  • Views 76
  • Downloads 0

How To Cite

Muhammad Usman, Zunaira Shafique, Saba Ayub, Kamran Malik (2016). Urdu Text Classification using Majority Voting. International Journal of Advanced Computer Science & Applications, 7(8), 265-273. https://europub.co.uk/articles/-A-133714