NADA: New Arabic Dataset for Text Classification

Abstract

In the recent years, Arabic Natural Language Processing, including Text summarization, Text simplification, Text Categorization and other Natural Language-related disciplines, are attracting more researchers. Appropriate resources for Arabic Text Categorization are becoming a big necessity for the development of this research. The few existing corpora are not ready for use, they require preprocessing and filtering operations. In addition, most of them are not organized based on standard classification methods which makes unbalanced classes and thus reduced the classification accuracy. This paper proposes a New Arabic Dataset (NADA) for Text Categorization purpose. This corpus is composed of two existing corpora OSAC and DAA. The new corpus is preprocessed and filtered using the recent state of the art methods. It is also organized based on Dewey decimal classification scheme and Synthetic Minority Over-Sampling Technique. The experiment results show that NADA is an efficient dataset ready for use in Arabic Text Categorization.

Authors and Affiliations

Nada Alalyani, Souad Larabi Marie-Sainte

Keywords

Related Articles

An Analytical Model for Availability Evaluation of Cloud Service Provisioning System

Cloud computing is a major technological trend that continues to evolve and flourish. With the advent of the cloud, high availability assurance of cloud service has become a critical issue for cloud service providers and...

Scalability and Performance of Selected Websites of Universities: An Analytical Study of Punjab (India)

Today, education has emerged as a major area of commercial activities. The access to various University websites through Internet has opened up new opportunities for the beneficiaries. The creation of these websites full...

Adaptive Error Detection Method for P300-based Spelling Using Riemannian Geometry

Brain-Computer Interface (BCI) systems have be-come one of the valuable research area of ML (Machine Learning) and AI based techniques have brought significant change in traditional diagnostic systems of medical diagnosi...

An Object-Oriented Smartphone Application for Structural Finite Element Analysis

Smartphones are becoming increasingly ubiquitous both in general society and the workplace. Recent increases in mobile processing power have shown the current generation of smartphones has equivalent processing power to...

Intelligent Security for Phishing Online using Adaptive Neuro Fuzzy Systems

Anti-phishing detection solutions employed in industry use blacklist-based approaches to achieve low false-positive rates, but blacklist approaches utilizes website URLs only. This study analyses and combines phishing em...

Download PDF file
  • EP ID EP393876
  • DOI 10.14569/IJACSA.2018.090928
  • Views 103
  • Downloads 0

How To Cite

Nada Alalyani, Souad Larabi Marie-Sainte (2018). NADA: New Arabic Dataset for Text Classification. International Journal of Advanced Computer Science & Applications, 9(9), 206-212. https://europub.co.uk/articles/-A-393876