Feature Selection And Vectorization In Legal Case DocumentsUsing Chi-Square Statistical Analysis And Naïve BayesApproaches

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 2

Abstract

 Abstract : Most machine learning techniques employed in the area of text classification require the features ofthe documents to be effectively selected owing to the large chunk of data encountered in the classificationprocess and term weights built from document vectors for proper infusing into the respective classifieralgorithms. Effective selection of the most important features from the raw documents is achieved byimplementing more extensive pre-processing techniques and the features obtained were ranked using the chisquarestatistical approach for the elimination of irrelevant features and proper selection of more relevantfeatures in the entire corpus. The most relevant ranked features obtained are converted to word vectors which isbased on the number of occurrences of words in the documents or categories concerned, using the probabilisticcharacteristics of Naïve Bayes as a vectorizer for machine learning classifiers. This hybrid vector space modelwas experimented on legal text categories and the study revealed better discovered features using the preprocessingand ranking technique, while better term weights from the documents was successfully built formachine learning classifiers used in the text classification process.

Authors and Affiliations

Obasi, Chinedu Kingsley , Ugwu, Chidiebere

Keywords

Related Articles

 A Survey of Weight-Based Clustering Algorithms in MANET

 As MANETs haven't any mounted infrastructure, all messages have to be routed through the nodes within the network. several clustering and routing algorithms are developed for MANETs. Moreover, most of the prevail...

Design of a Parliamentary Electronic Voting Response System

Abstract: Voting is an important process usually employed to reveal the opinion of a group on an issue that is under consideration such as the house of assemblies where crucial issues are deliberated upon. But in Nigeria...

Design and Development of an Integrated Platform for GSM, Web and Speech Based Device Controlling System

Abstract: In this modern era, as information technology is growing so far from the computing to communication, home automation is becoming a crucial area in research. In this proposed work, focus has been given on the de...

 Examining Performance of Bluetooth Network In The Presence of Wi-Fi System

 Abstract: More recently, there has been a growing interest in cognitive techniques, which allow devices and even whole networks to monitor the environment in order to dynamically select and use the channel that aff...

 An Analysis of students’ performance using classification  algorithms

 In recent years, the analysis and evaluation of students‟ performance and retaining the standard of education is a very important problem in all the educational institutions. The most important goal of the paper...

Download PDF file
  • EP ID EP137568
  • DOI -
  • Views 114
  • Downloads 0

How To Cite

Obasi, Chinedu Kingsley, Ugwu, Chidiebere (2015).  Feature Selection And Vectorization In Legal Case DocumentsUsing Chi-Square Statistical Analysis And Naïve BayesApproaches. IOSR Journals (IOSR Journal of Computer Engineering), 17(2), 42-50. https://europub.co.uk/articles/-A-137568