Reconsideration of Potential Problems of Applying EMIM for Text Analysis

Abstract

It seems that the term dependence methods developed using the expected mutual information measure (EMIM) have not achieved their potential in many areas of science, involving statistical text analysis or document processing. This study examines the reasons for the failure and highlights potential problems of applications. Several interesting questions are arisen, including, does a term provide any information if it occurs in all the sample documents? how the mutual information of two terms, under their status values, makes contribution to EMIM? are two terms highly dependent for their co-occurrence if they receive a high positive EMIM value? what may imply for dependence of two term pairs when they receive the same EMIM value? how can properly verify two terms to be high dependent for their cooccurrence? how can properly apply EMIM? does the size of the sample set matter? This study attempts to answer these questions in order to clarify confusions caused by the problems and/or suggest solutions to the problems. Some interesting examples are provided to clarify our viewpoints.

Authors and Affiliations

D. Cai

Keywords

Related Articles

Mitigation of Cascading Failures with Link Weight Control

Cascading failures are crucial issues for the study of survivability and resilience of our infrastructures and have attracted much interest in complex networks research. In this paper, we study the overload-based cascadi...

A Modified Feistel Cipher Involving Substitution, Shifting of rows, Mixing of columns, XOR operation with a Key and Shuffling

In this paper, we have developed a modification to the Feistel cipher by taking the plaintext in the form of a pair of matrices and introducing a set of functions namely, substitute, shifting of rows, mixing of columns a...

 OFDM System Analysis for reduction of Inter symbol Interference Using the AWGN Channel Platform

  Orthogonal Frequency Division Multiplexing (OFDM) transmissions are emerging as important modulation technique because of its capacity of ensuring high level of robustness against any interferences. This proj...

 A Cost-Effective Approach to the Design and Implementation of Microcontroller-based Universal Process Control Trainer

 This paper presents a novel approach to the design and implementation of a low-cost universal digital process control trainer. The need to equip undergraduates studying Electronic Engineering and other related cour...

Heterogeneous HW/SW FPGA-Based Embedded System for Database Sequencing Applications

Database sequencing applications including sequence comparison, searching, and analysis are considered among the most computation power and time consumers. Heuristic algorithms suffer from sensitivity while traditional s...

Download PDF file
  • EP ID EP152693
  • DOI 10.14569/IJACSA.2014.050826
  • Views 142
  • Downloads 0

How To Cite

D. Cai (2014). Reconsideration of Potential Problems of Applying EMIM for Text Analysis. International Journal of Advanced Computer Science & Applications, 5(8), 173-181. https://europub.co.uk/articles/-A-152693