Clustering Students’ Arabic Tweets using Different Schemes

Abstract

In this paper, Twitter has been chosen as a platform for clustering the topics that have been mentioned by King Abdulaziz University students to understand students’ behaviours and answer their inquiries. The aim of the study is to propose a model for clustering analysis of Saudi Arabian (standard and Arabian Gulf dialect) tweets to segment topics included in the students’ posts. A combination of the natural language processing (NLP) and the machine learning (ML) method to build models is used to cluster tweets according to their text similarity. K-mean algorithm is utilised with different vector representation schemes such as TF-IDF (term frequency-inverse document frequency) and BTO (binary-term occurrence). Distinct preprocessing is explored to obtain the N-grams term of tokens. The cluster distance performance task is applied to determine the average between the centroid clusters. Moreover, human evaluation clustering is performed by looking at the data source to make sure that the clusters are making sense to an educational domain. At this moment, each cluster has been identified, and students’ accounts on Twitter have been known by their facilities or their educational system, such as e-learning. The results show that the best vector’s representation was using BTO, and it will be useful to apply it to cluster students’ text instead of the TF-IDF scheme.

Authors and Affiliations

Hamed Al-Rubaiee, Khalid Alomar

Keywords

Related Articles

TEXTURE CLASSIFICATION BASED ON BIDIMENSIONAL EMPIRICAL MODE DECOMPOSITION AND LOCAL BINARY PATTERN

This paper presents a new simple and robust texture analysis feature based on Bidimensional Empirical Mode Decomposition (BEMD) and Local Binary Pattern (LBP). BEMD is a locally adaptive decomposition method and suitable...

Generation of Attributes for Bangla Words for Universal Networking Language(UNL)

The usage of native language through Internet is highly demanding now a day due to rapidly increase of Internet based application in daily needs. It is important to read all information in Bangla from the internet. Unive...

Modeling Access Control Policy of a Social Network

Social networks bring together users in a virtual platform and offer them the ability to share -within the Community- personal and professional information’s, photos, etc. which are sometimes sensitive. Although, the maj...

Digital Legacy: Posterity Rights Analysis and Proposed Model for Digital Memorabilia Adoption using Machine Learning

The paper informs about the digital legacy and its related concepts of posterity rights and digital memorabilia. It also deals with the right to exercise the digital posterity concerning the social networking profiles (S...

Sentiment Analysis of Arabic Jordanian Dialect Tweets

Sentiment Analysis (SA) of social media contents has become one of the growing areas of research in data mining. SA provides the ability of text mining the public opinions of a subjective manner in real time. This paper...

Download PDF file
  • EP ID EP258343
  • DOI 10.14569/IJACSA.2017.080438
  • Views 75
  • Downloads 0

How To Cite

Hamed Al-Rubaiee, Khalid Alomar (2017). Clustering Students’ Arabic Tweets using Different Schemes. International Journal of Advanced Computer Science & Applications, 8(4), 276-280. https://europub.co.uk/articles/-A-258343