Twitter Data Classification by Applying and Comparing Multiple Machine Learning Techniques

Abstract

Having an average of five hundred million tweets sent out per day, twitter has become one of the largest platforms of data analysis for the researchers. Previously, various researches have been conducted on twitter data i.e., sentimental analysis. However, not much research has been done to classify the tweets in terms of categories so that tweets can be distributed as per user preferences. In this research we started by creating four broad categories: politics, sports, crime and natural. After that, we applied different machine learning techniques (Random Forest, K-Nearest Neighbors, Naïve Bayes, Logistic Regression, Decision Tree and Support Vector Machine) to classify the twitter data. Finally, we compared the results in terms of sensitivity, specificity, precision, false positive rate and accuracy. We found that Support Vector Machine (SVM) produced the best results in terms of sensitivity, specificity, precision, false positive rate and accuracy. Hence, we concluded that a machine learning approach (Support Vector Machine) can certainly be used to classify twitter data. Constructed dataset, all the programs, figures and snippets can be found at https://github.com/ananyasarkertonu/Twitter-Dataset

Authors and Affiliations

Ananya Sarker, Md. Shahid Uz Zaman, Md. Azmain Yakin Srizon

Keywords

Related Articles

LIS-Service Product Industries: A Case Study of Marketing for Pacific Academic Institutions

The present study explainsthe concept for philosophy of LIS service product marketing, and rudiments of edge amid academic institutions and related industries which proposes to information products andamenities. This can...

Stresses Determination Method in Moving Parts of the Marine Engines

The moving parts of internal-combustion engines endures the highest and the most complex stresses. The tensile compression, bending and twisting stresses appear under the action of gas pressure forces and inertic forces...

Novel Arrangement of Boost Converters for Conduction Modes

Position control of motors is widely used in many aspects of life, including commercial, household, and industrial settings. More than seventy percent of electrical demand is projected to be motor load. The motors in thi...

A Wideband Rectangular Patch Antenna For 5 G Communications

Millimeter-wave (mmW) range for 5 G application becomes an attractive area of research. This allows communication at high speed and low latency rates. To meet this requirement, we require an antenna that can resonate at...

Iris Recognition Based Data Security for Secure Transmission Over Internet of Things Network

The biometric traits of any human being such as fingerprint, tongue prints, face recognition, Retina Scan, Iris recognition, etc. are unique and cannot be replicated or fabricated using modern technologically advanced pr...

Download PDF file
  • EP ID EP748074
  • DOI 10.21276/ijircst.2019.7.6.2
  • Views 28
  • Downloads 0

How To Cite

Ananya Sarker, Md. Shahid Uz Zaman, Md. Azmain Yakin Srizon (2019). Twitter Data Classification by Applying and Comparing Multiple Machine Learning Techniques. International Journal of Innovative Research in Computer Science and Technology, 7(6), -. https://europub.co.uk/articles/-A-748074