Twitter Data Classification by Applying and Comparing Multiple Machine Learning Techniques
Journal Title: International Journal of Innovative Research in Computer Science and Technology - Year 2019, Vol 7, Issue 6
Abstract
Having an average of five hundred million tweets sent out per day, twitter has become one of the largest platforms of data analysis for the researchers. Previously, various researches have been conducted on twitter data i.e., sentimental analysis. However, not much research has been done to classify the tweets in terms of categories so that tweets can be distributed as per user preferences. In this research we started by creating four broad categories: politics, sports, crime and natural. After that, we applied different machine learning techniques (Random Forest, K-Nearest Neighbors, Naïve Bayes, Logistic Regression, Decision Tree and Support Vector Machine) to classify the twitter data. Finally, we compared the results in terms of sensitivity, specificity, precision, false positive rate and accuracy. We found that Support Vector Machine (SVM) produced the best results in terms of sensitivity, specificity, precision, false positive rate and accuracy. Hence, we concluded that a machine learning approach (Support Vector Machine) can certainly be used to classify twitter data. Constructed dataset, all the programs, figures and snippets can be found at https://github.com/ananyasarkertonu/Twitter-Dataset
Authors and Affiliations
Ananya Sarker, Md. Shahid Uz Zaman, Md. Azmain Yakin Srizon
The New Topology of Multi Level Inverter with Minimum of Switches
A new multilayer inverter topology is proposed in this study. The cascaded feature is used in this innovative topology. In addition to the isolated DC sources seen in Cascaded H-bridge. The clamping diode in Diode and th...
Digital Forensics Triage Classification Model using Hybrid Learning Approaches
The Internet and the accessibility of gadgets with connectivity have resulted in the global spread of cyber threats and cybercrime, posing significant hurdles for digital forensics. Consequently, the volume of informatio...
Regression and Correlation Analysis of Different Interestingness Measures for Mining Association Rules
Association Rule Mining is the significant way to extract knowledge from data sets. The association among the instance of a dataset can measured with Interestingness Measures (IM) metrics. IM define how much interesting...
Investigative Study on the Properties of Hollow Concrete Blocks
The utilization of workmanship structures is as yet broad all through the world. Hollow concrete blocks have supplanted customary bricks in late development as a result of the upsides of higher bearing limit, farmland in...
Study On Strength Parameters Of Translucent Concrete
The talent of building production arouses the human need for shelter. Given that then the talent has been at risk of various modifications due to outer surroundings and the man's want. Nowadays the ability has been an ar...