Feature Selection And Vectorization In Legal Case DocumentsUsing Chi-Square Statistical Analysis And Naïve BayesApproaches

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 2

Abstract

 Abstract : Most machine learning techniques employed in the area of text classification require the features ofthe documents to be effectively selected owing to the large chunk of data encountered in the classificationprocess and term weights built from document vectors for proper infusing into the respective classifieralgorithms. Effective selection of the most important features from the raw documents is achieved byimplementing more extensive pre-processing techniques and the features obtained were ranked using the chisquarestatistical approach for the elimination of irrelevant features and proper selection of more relevantfeatures in the entire corpus. The most relevant ranked features obtained are converted to word vectors which isbased on the number of occurrences of words in the documents or categories concerned, using the probabilisticcharacteristics of Naïve Bayes as a vectorizer for machine learning classifiers. This hybrid vector space modelwas experimented on legal text categories and the study revealed better discovered features using the preprocessingand ranking technique, while better term weights from the documents was successfully built formachine learning classifiers used in the text classification process.

Authors and Affiliations

Obasi, Chinedu Kingsley , Ugwu, Chidiebere

Keywords

Related Articles

 Virtual Community in Interactive Teaching: Five Cases

 Modern teaching methods demand innovative and effective use of technology at utmost level. Incorporating a virtual community outside classroom teaching has become inevitable in digital age education.  This r...

 Quality Technical, Vocational Education and Training: A Toolfor Self Reliance

 Abstract: This paper has discusses the relevance of a quality TVE training as a tool for self-reliance. Itidentified skill and knowledge as the engine room for economic growth and sustainable livelihood only if eff...

 Rural Small and Medium Enterprise: Information and Communication Technology as Panacea

 Abstract: Though, there are variations in the definitions of small and medium enterprises but it can be generally accepted as business activities that employ the services of 150 or fewer people in its activities...

An Approach to Sentiment Analysis using Artificial Neural Network with Comparative Analysis of Different Techniques

Abstract : Sentiment Analysis is the process of identifying whether the opinion or reviews expressed in a piece of work is positive, negative or neutral. Sentiment analysis is useful in social media monitoring to automat...

 A Combined Approach of Software Metrics and Software Fault Analysis to Estimate Software Reliability

 This paper presents a fault prediction model using reliability relevant software metrics and fuzzy inference system. For this a new approach is discussed to develop fuzzy profile of software metrics which are &nb...

Download PDF file
  • EP ID EP137568
  • DOI -
  • Views 121
  • Downloads 0

How To Cite

Obasi, Chinedu Kingsley, Ugwu, Chidiebere (2015).  Feature Selection And Vectorization In Legal Case DocumentsUsing Chi-Square Statistical Analysis And Naïve BayesApproaches. IOSR Journals (IOSR Journal of Computer Engineering), 17(2), 42-50. https://europub.co.uk/articles/-A-137568