Normalization of Unstructured and Informal Text in Sentiment Analysis

Abstract

Sentiment Analysis is problem of natural language processing which deals with the extraction and analysis of public sentiments shared about target entities over microbloging websites. This field has gained great attention due to the huge availability of decision making textual contents. Sentiment Analysis has enormous application areas such as; Market Analysis, Service Analysis, Showbiz analysis, Movies, sports and even the popularity and acceptance rate of political policies can also be predicted via sentiment analysis systems. Although tremendous volume of opinionative text is available but it is unstructured and noisy due to which sentiment classifiers can’t achieve good outcomes. Normalization is the process used to clean noise from unstructured text for sentiment analysis. In this study we have proposed a mechanism for the normalization of informal and unstructured text. Proposed mechanism is comprised of four essential phases; Noise Reduction, Part of Speech Tagging, Stop Word Removal stemming and Lemmatization. Numerous experiments are performed on twitter data set with unsupervised lexicons and dictionaries. Python and Natural language toolkit is used for performing all four essential steps. This study demonstrates that utilization and normalization of informal tokens in tweets improved the overall classification accuracy from 75.42 to 82.357.

Authors and Affiliations

Muhammad Javed, Shahid Kamal

Keywords

Related Articles

Gesture Recognition based on Human Grasping Activities using PCA-BMU

This research study presents the recognition of fingers grasps for various grasping styles of daily living. In general, the posture of the human hand determines the fingers that are used to create contact between an obje...

Awareness Survey of Anonymisation of Protected Health Information in Pakistan

With the growing advancement of science and technology, research has become the vital step in every educational field. This research survey sheds light on the methods of de-identification and anonymisation for protecting...

Forks impacts and motivations in free and open source projects 

 Forking is a mechanism of splitting in a community and is typically found in the free and open source software field. As a failure of cooperation in a context of open innovation, forking is a practical and informat...

Telugu Bigram Splitting using Consonant-based and Phrase-based Splitting

Splitting is a conventional process in most of Indian languages according to their grammar rules. It is called ‘pada vicchEdanam’ (a Sanskrit term for word splitting) and is widely used by most of the Indian languages. S...

Effect of Fusion of Statistical and Texture Features on HSI based Leaf Images with Both Dorsal and Ventral Sides

The present work involves statistically analyzing and studying the overall classification accuracy results using Hue channel images of different plant species using their dorsal and ventral sides, and then subjecting the...

Download PDF file
  • EP ID EP407312
  • DOI 10.14569/IJACSA.2018.091011
  • Views 111
  • Downloads 0

How To Cite

Muhammad Javed, Shahid Kamal (2018). Normalization of Unstructured and Informal Text in Sentiment Analysis. International Journal of Advanced Computer Science & Applications, 9(10), 78-85. https://europub.co.uk/articles/-A-407312