Twitter Data Classification by Applying and Comparing Multiple Machine Learning Techniques

Abstract

Having an average of five hundred million tweets sent out per day, twitter has become one of the largest platforms of data analysis for the researchers. Previously, various researches have been conducted on twitter data i.e., sentimental analysis. However, not much research has been done to classify the tweets in terms of categories so that tweets can be distributed as per user preferences. In this research we started by creating four broad categories: politics, sports, crime and natural. After that, we applied different machine learning techniques (Random Forest, K-Nearest Neighbors, Naïve Bayes, Logistic Regression, Decision Tree and Support Vector Machine) to classify the twitter data. Finally, we compared the results in terms of sensitivity, specificity, precision, false positive rate and accuracy. We found that Support Vector Machine (SVM) produced the best results in terms of sensitivity, specificity, precision, false positive rate and accuracy. Hence, we concluded that a machine learning approach (Support Vector Machine) can certainly be used to classify twitter data. Constructed dataset, all the programs, figures and snippets can be found at https://github.com/ananyasarkertonu/Twitter-Dataset

Authors and Affiliations

Ananya Sarker, Md. Shahid Uz Zaman, Md. Azmain Yakin Srizon

Keywords

Related Articles

A Study on Prevention of Soil Erosion in Hilly Region Using Jute Footrub Mats

Topsoil erosion is the most common issues in today’s world related to soil distresses. Soil erosion can cause contamination of drinking water, disturbs ecosystem of lakes and other water bodies and can cause landslides p...

A Technique Approaching for Catching User Intention with Textual and Visual Correspondence

The rapid expansion in web environment and advancement in technology have led us to access and manage enormous images easily in various fields. Present internet image search engines purely faith on keyword based informat...

Mitigation and Identification of Camouflage Attack Over Computer Vision Applications

Computer vision technologies are now commonly used in real-time image and video recognition applications using deep neural networks. Scaling or Mitigation is the basic input pre-processing feature in these implementation...

Prediction of Financial Crime Using Machine Learning

The purpose of data analytics is to uncover previously unknown patterns and make use of such patterns to help in making educated decisions across a wide range of contexts. Because of advances in modern technology and the...

Automation of Electroplating Technique Using P.L.C.

Electroplating technique is widely utilized in various industries for the purpose of coating metal objects with a thin layer of a different metal’s. The layer of metal deposited has some desired property, which the metal...

Download PDF file
  • EP ID EP748074
  • DOI 10.21276/ijircst.2019.7.6.2
  • Views 43
  • Downloads 0

How To Cite

Ananya Sarker, Md. Shahid Uz Zaman, Md. Azmain Yakin Srizon (2019). Twitter Data Classification by Applying and Comparing Multiple Machine Learning Techniques. International Journal of Innovative Research in Computer Science and Technology, 7(6), -. https://europub.co.uk/articles/-A-748074