Review of issues in automatic labelling of formatted document 

Abstract

The labelling framework, which is proposed to label topic models, essentially consists of a multinomial word distribution, a set of candidate labels, and a context collection. Thus it could be applied to any text mining problems, in which a multinomial distribution of word is involved. To generate labels that are understandable, semantically relevant, discriminative across topics, and of high coverage of each topic, first extract a set of understandable candidate labels in a pre-processing step, then design a relevance scoring function to measure the semantic similarity between a label and a topic, and finally propose label selection methods. This paper presents all such issues involved in the problem of knowledge discovery using text mining. Our paper aims to review various issues described or presented by various researchers in this area.  

Authors and Affiliations

Pallavi Galgale , Priyanka Ahire , Snehal Ingavale , Dr. R. S. Prasad

Keywords

Related Articles

SEARCHING TECHNIQUES IN ENCRYPTED CLOUD DATA  

Cloud computing can be defined as a new style of computing in which the resources are provided online through the internet. It provides storage as well as service. It uses the technique of virtualization. Virtualiz...

DATA SHARING IN THE CLOUD USING DISTRIBUTED ACCOUNTABILITY  

Cloud computing enables highly scalable services to be easily consumed over the Internet on an as-needed basis. A major feature of the cloud services is that users’ data are usually processed remotely in unknown...

A Comparative Study On Some New Steganographic Techniques 

In this paper, we present some new methods and develop techniques with algorithms for hiding text behind the gray scale image. Steganographic technique allows the sender to communicate information to the receiver w...

Heterogeneous Interface Mobile Node in NS2  

The heterogeneous interface for the mobile node is the key feature for the next generation mobile world. It provides the flexibility for the mobile devices for moving devices to the next available best network for...

Enhanced Approach on Web Page Classification Using Machine Learning Technique  

The data set contains WWW-pages collected from computer science departments of various universities in January 1997 by the World Wide Knowledge Base project of the CMU text learning group. The 8,282 pages were ma...

Download PDF file
  • EP ID EP125830
  • DOI -
  • Views 80
  • Downloads 0

How To Cite

Pallavi Galgale, Priyanka Ahire, Snehal Ingavale, Dr. R. S. Prasad (2012). Review of issues in automatic labelling of formatted document . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(10), 301-304. https://europub.co.uk/articles/-A-125830