Clustering Students’ Arabic Tweets using Different Schemes

Abstract

In this paper, Twitter has been chosen as a platform for clustering the topics that have been mentioned by King Abdulaziz University students to understand students’ behaviours and answer their inquiries. The aim of the study is to propose a model for clustering analysis of Saudi Arabian (standard and Arabian Gulf dialect) tweets to segment topics included in the students’ posts. A combination of the natural language processing (NLP) and the machine learning (ML) method to build models is used to cluster tweets according to their text similarity. K-mean algorithm is utilised with different vector representation schemes such as TF-IDF (term frequency-inverse document frequency) and BTO (binary-term occurrence). Distinct preprocessing is explored to obtain the N-grams term of tokens. The cluster distance performance task is applied to determine the average between the centroid clusters. Moreover, human evaluation clustering is performed by looking at the data source to make sure that the clusters are making sense to an educational domain. At this moment, each cluster has been identified, and students’ accounts on Twitter have been known by their facilities or their educational system, such as e-learning. The results show that the best vector’s representation was using BTO, and it will be useful to apply it to cluster students’ text instead of the TF-IDF scheme.

Authors and Affiliations

Hamed Al-Rubaiee, Khalid Alomar

Keywords

Related Articles

 Dimensionality Reduction technique using Neural Networks – A Survey

 A self-organizing map (SOM) is a classical neural network method for dimensionality reduction. It comes under the unsupervised class. SOM is a neural network that is trained using unsupervised learning to produce a...

Forecasting Rainfall Time Series with stochastic output approximated by neural networks Bayesian approach

The annual estimate of the availability of the amount of water for the agricultural sector has become a lifetime in places where rainfall is scarce, as is the case of northwestern Argentina. This work proposes to model a...

Corrupted MP4 Carving Using MP4-Karver

In the digital forensic, recovery of deleted and damaged video files play an important role in searching for the evidences. In this paper, MP4-Karver tool is proposed to recover and repair the corrupted videos. Moreover,...

Load Balancing for Improved Quality of Service in the Cloud

Due to the advancement in technology and the growth of human society, it is necessary to work in an environment that reduces costs, resource-efficient, reduces man power and minimizes the use of space. This led to the em...

Conceptual Model for WWBAN (Wearable Wireless Body Area Network)

Modern world advances in sensors miniaturization and wireless networking which enables exploiting wireless sensor networking to monitor and control the environment. Human health monitoring is promising applications of se...

Download PDF file
  • EP ID EP258343
  • DOI 10.14569/IJACSA.2017.080438
  • Views 111
  • Downloads 0

How To Cite

Hamed Al-Rubaiee, Khalid Alomar (2017). Clustering Students’ Arabic Tweets using Different Schemes. International Journal of Advanced Computer Science & Applications, 8(4), 276-280. https://europub.co.uk/articles/-A-258343