Normalization of Unstructured and Informal Text in Sentiment Analysis

Abstract

Sentiment Analysis is problem of natural language processing which deals with the extraction and analysis of public sentiments shared about target entities over microbloging websites. This field has gained great attention due to the huge availability of decision making textual contents. Sentiment Analysis has enormous application areas such as; Market Analysis, Service Analysis, Showbiz analysis, Movies, sports and even the popularity and acceptance rate of political policies can also be predicted via sentiment analysis systems. Although tremendous volume of opinionative text is available but it is unstructured and noisy due to which sentiment classifiers can’t achieve good outcomes. Normalization is the process used to clean noise from unstructured text for sentiment analysis. In this study we have proposed a mechanism for the normalization of informal and unstructured text. Proposed mechanism is comprised of four essential phases; Noise Reduction, Part of Speech Tagging, Stop Word Removal stemming and Lemmatization. Numerous experiments are performed on twitter data set with unsupervised lexicons and dictionaries. Python and Natural language toolkit is used for performing all four essential steps. This study demonstrates that utilization and normalization of informal tokens in tweets improved the overall classification accuracy from 75.42 to 82.357.

Authors and Affiliations

Muhammad Javed, Shahid Kamal

Keywords

Related Articles

Liver Extraction Method from Magnetic Resonance Cholangio-Pancreatography (MRCP) Images

Liver extraction from medical images like CT scan and MR images is a challenging task. There are many manuals, Semi-automatic and automatic methods available to extract the liver from computerized tomography (CT) scan im...

A Survey of Spam Detection Methods on Twitter

Twitter is one of the most popular social media platforms that has 313 million monthly active users which post 500 million tweets per day. This popularity attracts the attention of spammers who use Twitter for their mali...

Predictive Performance Comparison Analysis of Relational & NoSQL Graph Databases

From last three decades, the relational databases are being used in many organizations of various natures such as Education, Health, Business and in many other applications. Traditional databases show tremendous performa...

Monitoring Vaccine Cold Chain Model with Coloured Petri Net

To protect and prevent vaccines from excessively high or low temperatures throughout the supply chain, from manufacturing to administration, it is necessary to monitor and evaluate vaccine cold chain performance in real...

A Novel Cloud Computing Security Model to Detect and Prevent DoS and DDoS Attack

Cloud computing has been considered as one of the crucial and emerging networking technology, which has been changed the architecture of computing in last few years. Despite the security concerns of protecting data or pr...

Download PDF file
  • EP ID EP407312
  • DOI 10.14569/IJACSA.2018.091011
  • Views 110
  • Downloads 0

How To Cite

Muhammad Javed, Shahid Kamal (2018). Normalization of Unstructured and Informal Text in Sentiment Analysis. International Journal of Advanced Computer Science & Applications, 9(10), 78-85. https://europub.co.uk/articles/-A-407312