Importance of Text Data Preprocessing & Implementation in RapidMiner

Journal Title: Annals of Computer Science and Information Systems - Year 2018, Vol 14, Issue

Abstract

Data preparation is an important phase before applying any machine learning algorithms. Same with the text data before applying any machine learning algorithm on text data, it requires data preparation. The data preparation is done by data preprocessing. The preprocessing of text means cleaning of noise such as: cleaning of stop words, punctuation, terms which doesn't carry much weightage in context to the text, etc. In this paper, we describe in detail how to prepare data for machine learning algorithms using RapidMiner tool. This preprocessing is followed by conversion of bag of words into term vector model and describe about the various algorithms which can be applied in RapidMiner for data analysis and predictive modeling. We also discussed about the challenges and applications of text mining in recent days

Authors and Affiliations

Vaishali Kalra, Rashmi Aggarwal

Keywords

Related Articles

The Role of Computer Science and Software Technology in Organizing Universities for Industry 4.0 and Beyond

This paper analyzes the recent developments around Industry 4.0 and beyond, identifies the necessary organizational structures of universities to assist companies in their transition processes, defines the relevant sub-d...

A new task scheduling approach based on Spacing Multi-Objective Genetic algorithm in cloud

The dazzling progress in information and communication technologies, contributed significantly to the emergence of cloud computing paradigm, where it promotes prosperity in all fields of human activity, especially in busin...

Importance of Search Engine Marketing in the Digital World

Object tracking is one of the vital fields of computer vision that detects the moving object from a video sequence. Internet has changed the world to global village. Due to improved connectivity and increase in data usag...

Dataset Enhancement in Hair Follicle Detection: ESENSEI Challenge

In this paper, a solution to ESENSEI data mining challenge concerning the analysis of microscopic hair images is described. The task of the challenge was to detect locations of hair follicles in closeup images of a human...

Accelerating Minimum Cost Polygon Triangulation Code with the TRACO Compiler

In this paper, we present automatic loop tiling and parallelization for the minimum cost polygon triangulation (MCPT) task. For this purpose, we use the authorial source-to-source TRACO compiler. MCPT is a recursive algo...

Download PDF file
  • EP ID EP569711
  • DOI 10.15439/2017KM46
  • Views 51
  • Downloads 0

How To Cite

Vaishali Kalra, Rashmi Aggarwal (2018). Importance of Text Data Preprocessing & Implementation in RapidMiner. Annals of Computer Science and Information Systems, 14(), 71-75. https://europub.co.uk/articles/-A-569711