Normalization of Unstructured and Informal Text in Sentiment Analysis

Abstract

Sentiment Analysis is problem of natural language processing which deals with the extraction and analysis of public sentiments shared about target entities over microbloging websites. This field has gained great attention due to the huge availability of decision making textual contents. Sentiment Analysis has enormous application areas such as; Market Analysis, Service Analysis, Showbiz analysis, Movies, sports and even the popularity and acceptance rate of political policies can also be predicted via sentiment analysis systems. Although tremendous volume of opinionative text is available but it is unstructured and noisy due to which sentiment classifiers can’t achieve good outcomes. Normalization is the process used to clean noise from unstructured text for sentiment analysis. In this study we have proposed a mechanism for the normalization of informal and unstructured text. Proposed mechanism is comprised of four essential phases; Noise Reduction, Part of Speech Tagging, Stop Word Removal stemming and Lemmatization. Numerous experiments are performed on twitter data set with unsupervised lexicons and dictionaries. Python and Natural language toolkit is used for performing all four essential steps. This study demonstrates that utilization and normalization of informal tokens in tweets improved the overall classification accuracy from 75.42 to 82.357.

Authors and Affiliations

Muhammad Javed, Shahid Kamal

Keywords

Related Articles

Big Data Classification Using the SVM Classifiers with the Modified Particle Swarm Optimization and the SVM Ensembles

The problem with development of the support vector machine (SVM) classifiers using modified particle swarm optimization (PSO) algorithm and their ensembles has been considered. Solving this problem would allow fulfilling...

 Design of a web-based courseware authoring and presentation system

 A Web-based Courseware Authoring and Presentation System is a user-friendly and interactive e-learning software that can be used by both computer experts and non-computer experts to prepare a courseware in any subj...

Designing a Switching based Workflow Scheduling Framework for Networked Environments

Due to the dynamics of the power of resources in non-dedicated computing environments such as Grid, and on the other hand, the autonomy of these environments and, consequently, the impossibility of repeating the operatin...

Android Platform Malware Analysis

Mobile devices have evolved from simple devices, which are used for a phone call and SMS messages to smartphone devices that can run third party applications. Nowadays, malicious software, which is also known as malware,...

Information System Evaluation based on Multi-Criteria Decision Making: A Comparison of Two Sectors

In this article, our purpose is to introduce the results of a new approach to assess the information system success. It is based on the DeLone and McLean model and was applied on two domains. The chosen domains are banki...

Download PDF file
  • EP ID EP407312
  • DOI 10.14569/IJACSA.2018.091011
  • Views 93
  • Downloads 0

How To Cite

Muhammad Javed, Shahid Kamal (2018). Normalization of Unstructured and Informal Text in Sentiment Analysis. International Journal of Advanced Computer Science & Applications, 9(10), 78-85. https://europub.co.uk/articles/-A-407312