Importance of Text Data Preprocessing & Implementation in RapidMiner

Journal Title: Annals of Computer Science and Information Systems - Year 2018, Vol 14, Issue

Abstract

Data preparation is an important phase before applying any machine learning algorithms. Same with the text data before applying any machine learning algorithm on text data, it requires data preparation. The data preparation is done by data preprocessing. The preprocessing of text means cleaning of noise such as: cleaning of stop words, punctuation, terms which doesn't carry much weightage in context to the text, etc. In this paper, we describe in detail how to prepare data for machine learning algorithms using RapidMiner tool. This preprocessing is followed by conversion of bag of words into term vector model and describe about the various algorithms which can be applied in RapidMiner for data analysis and predictive modeling. We also discussed about the challenges and applications of text mining in recent days

Authors and Affiliations

Vaishali Kalra, Rashmi Aggarwal

Keywords

Related Articles

A Perspective Approach (OABC) Algorithm using Square Odd Routing for minimized Energy Consumption

ABC set of principles has been already proposed furthermore with some drove guidelines, yet the length of the work parameter has been spinning round detecting the hubs in static or dynamic way with no accentuation at the...

Business Process Management: Terms, Trends and Models

Business Process Management (BPM) is a subject that is becoming a growing trend in the fields of Business Administration, Engineering, Information Technology (IT), among others. Understanding the subject is a complex and...

Robotic Process Automation of Unstructured Data with Machine Learning

In this paper we present our work in progress on building an artificial intelligence system dedicated to tasks regarding the processing of formal documents used in various kinds of business procedures. The main challenge...

Development of a mathematical model for electrode systems in rheoophthalmography

The problem of estimating the electrical impedance characteristics was solved using the system of impedance diagnostics of blood circulation with the help of mathematical modeling. In this work, the geometry for mathemat...

The Potential of the Internet of Things in Knowledge Management System

Along with the increasing globalization and development of information and communication technology, business models are changing, and thus the need for innovative knowledge management is growing. Current knowledge manag...

Download PDF file
  • EP ID EP569711
  • DOI 10.15439/2017KM46
  • Views 23
  • Downloads 0

How To Cite

Vaishali Kalra, Rashmi Aggarwal (2018). Importance of Text Data Preprocessing & Implementation in RapidMiner. Annals of Computer Science and Information Systems, 14(), 71-75. https://europub.co.uk/articles/-A-569711