Convolutional Neural Networks in Predicting Missing Text in Arabic

Abstract

Missing text prediction is one of the major concerns of Natural Language Processing deep learning community’s at-tention. However, the majority of text prediction related research is performed in other languages but not Arabic. In this paper, we take a first step in training a deep learning language model on Arabic language. Our contribution is the prediction of missing text from text documents while applying Convolutional Neural Networks (CNN) on Arabic Language Models. We have built CNN-based Language Models responding to specific settings in relation with Arabic language. We have prepared our dataset of a large quantity of text documents freely downloaded from Arab World Books, Hindawi foundation, and Shamela datasets. To calculate the accuracy of prediction, we have compared documents with complete text and same documents with missing text. We realized training, validation and test steps at three different stages aiming to increase the performance of prediction. The model had been trained at first stage on documents of the same author, then at the second stage, it had been trained on documents of the same dataset, and finally, at the third stage, the model had been trained on all document confused. Steps of training, validation and test have been repeated many times by changing each time the author, dataset, and the combination author-dataset, respectively. Also we have used the technique of enlarging training data by feeding the CNN-model each time by a larger quantity of text. The model gave a high performance of Arabic text prediction using Convolutional Neural Networks with an accuracy that have reached 97.8% in best case.

Authors and Affiliations

Adnan Souri, Mohamed Alachhab, Badr Eddine Elmohajir, Abdelali Zbakh

Keywords

Related Articles

A Strategy for Training Set Selection in Text Classification Problems

An issue in text classification problems involves the choice of good samples on which to train the classifier. Training sets that properly represent the characteristics of each class have a better chance of establishing...

A Survey On Interactivity in Topic Models

Trying to make sense and gain deeper insight from large sets of data is becoming a task very central to computer science in general. Topic models, capable of uncovering the semantic themes pervading through large collect...

Deep Learning based Computer Aided Diagnosis System for Breast Mammograms

In this paper, a framework has been presented by using a combination of deep Convolutional Neural Network (CNN) with Support Vector Machine (SVM). Proposed method first perform preprocessing to resize the image so that i...

Opinion Mining: An Approach to Feature Engineering

Sentiment Analysis or opinion mining refers to a process of identifying and categorizing the subjective information in source materials using natural language processing (NLP), text analytics and statistical linguistics....

Semantic Retrieval Approach for Web Documents 

Because of explosive growth of resources in the internet, the information retrieval technology has become particularly important. However the current retrieval methods are essentially based on the full text matching of k...

Download PDF file
  • EP ID EP597478
  • DOI 10.14569/IJACSA.2019.0100668
  • Views 110
  • Downloads 0

How To Cite

Adnan Souri, Mohamed Alachhab, Badr Eddine Elmohajir, Abdelali Zbakh (2019). Convolutional Neural Networks in Predicting Missing Text in Arabic. International Journal of Advanced Computer Science & Applications, 10(6), 520-527. https://europub.co.uk/articles/-A-597478