NADA: New Arabic Dataset for Text Classification
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2018, Vol 9, Issue 9
Abstract
In the recent years, Arabic Natural Language Processing, including Text summarization, Text simplification, Text Categorization and other Natural Language-related disciplines, are attracting more researchers. Appropriate resources for Arabic Text Categorization are becoming a big necessity for the development of this research. The few existing corpora are not ready for use, they require preprocessing and filtering operations. In addition, most of them are not organized based on standard classification methods which makes unbalanced classes and thus reduced the classification accuracy. This paper proposes a New Arabic Dataset (NADA) for Text Categorization purpose. This corpus is composed of two existing corpora OSAC and DAA. The new corpus is preprocessed and filtered using the recent state of the art methods. It is also organized based on Dewey decimal classification scheme and Synthetic Minority Over-Sampling Technique. The experiment results show that NADA is an efficient dataset ready for use in Arabic Text Categorization.
Authors and Affiliations
Nada Alalyani, Souad Larabi Marie-Sainte
A Modified clustering for LEACH algorithm in WSN
Node clustering and data aggregation are popular techniques to reduce energy consumption in large Wireless Sensor Networks (WSN). Cluster based routing is always a hot research area in wireless sensor networks. Classical...
Modeling of neural image compression using GA and BP a comparative approach
It is well known that the classic image compression techniques such as JPEG and MPEG have serious limitations at high compression rate; the decompressed image gets really fuzzy or indistinguishable. To overco...
Customer Value Proposition for E-Commerce: A Case Study Approach
E-Commerce tools have become a human needs everywhere and important not only to customers but to industry players. The intention to use E-Commerce tools among practitioners, especially in the Malaysian retail sector is n...
Impact of Heterogeneous Deployment on Source Initiated Reactive Approach
Selection of an optimal number of high energy level nodes and the most appropriate heterogeneity level is a prerequisite in the heterogeneous deployment of wireless sensor network, and it serves several purposes like enh...
Building a Penetration Testing Device for Black Box using Modified Linux for Under $50
This study analyzes the use of a Raspberry Pi (RPi) as part of a Penetration Tester’s toolkit. The RPi’s form factor, performance to cost ratio, used in conjunction with modified Linux, allows the RPi to be a very versat...