Investigating the Use of Machine Learning Algorithms in Detecting Gender of the Arabic Tweet Author
Journal Title: International Journal of Advanced Computer Science & Applications - Year 2016, Vol 7, Issue 7
Abstract
Twitter is one of the most popular social network sites on the Internet to share opinions and knowledge extensively. Many advertisers use these Tweets to collect some features and attributes of Tweeters to target specific groups of highly engaged people. Gender detection is a sub-field of sentiment analysis for extracting and predicting the gender of a Tweet author. In this paper, we aim to investigate the gender of Tweet authors using different classification mining techniques on Arabic language, such as Naïve Bayes (NB), Support vector machine (SVM), Naïve Bayes Multinomial (NBM), J48 decision tree, KNN. The results show that the NBM, SVM, and J48 classifiers can achieve accuracy above to 98%, by adding names of Tweet author as a feature. The results also show that the preprocessing approach has negative effect on the accuracy of gender detection. In nutshell, this study shows that the ability of using machine learning classifiers in detecting the gender of Arabic Tweet author.
Authors and Affiliations
Emad AlSukhni, Qasem Alequr
An Ensemble of Fine-Tuned Heterogeneous Bayesian Classifiers
Bayesian network (BN) classifiers use different structures and different training parameters which leads to diversity in classification decisions. This work empirically shows that building an ensemble of several fine-tun...
A Proposed NFC Payment Application
Near Field Communication (NFC) technology is based on a short range radio communication channel which enables users to exchange data between devices. With NFC technology, mobile services establish a contactless transacti...
Enhancing the Administration of National Examinations using Mobile Cloud Technologies: A Case of Malawi National Examinations Board
Technological advances and the search for efficiency have catalyzed recently a migration from paper-and-pencil based way of doing things to computer-based in education and training at all levels with its drivers being fa...
Validation Policy Statement on the Digital Evidence Storage using First Applicable Algorithm
Digital Evidence Storage is placed to store digital evidence files. Digital evidence is very vulnerable to damage. Therefore, making digital evidence storage need access control. Access control has several models, one o...
A Novel Approach to Detect Duplicate Code Blocks to Reduce Maintenance Effort
It was found in many cases that a code might be a clone for one programmer but not the same for another one. This problem occurs because of inaccurate documentation. According to research, the maintainers are not aware o...