Review of issues in automatic labelling of formatted document 

Abstract

The labelling framework, which is proposed to label topic models, essentially consists of a multinomial word distribution, a set of candidate labels, and a context collection. Thus it could be applied to any text mining problems, in which a multinomial distribution of word is involved. To generate labels that are understandable, semantically relevant, discriminative across topics, and of high coverage of each topic, first extract a set of understandable candidate labels in a pre-processing step, then design a relevance scoring function to measure the semantic similarity between a label and a topic, and finally propose label selection methods. This paper presents all such issues involved in the problem of knowledge discovery using text mining. Our paper aims to review various issues described or presented by various researchers in this area.  

Authors and Affiliations

Pallavi Galgale , Priyanka Ahire , Snehal Ingavale , Dr. R. S. Prasad

Keywords

Related Articles

Remote Monitoring, Controlling and Lost Hardware Detecting through GSM

The project aims to develop various network utilities which are required to effectively monitor, control via GSM and to provide security to a LAN network. It aims to develop an integrated software solution that allows a...

F-Measure Metric for English to Hindi Language Machine Translation 

The main objective of MT is to break the language barrier in a multilingual nation like India. Evaluation of MT is required for Indian languages because the same MT is not works in Indian language as in European la...

Auto-assemblage for Suffix Tree Clustering 

Due to explosive growth of extracting the information from large repository of data, to get effective results, clustering is used. Clustering makes the searching efficient for better search results. Clustering is t...

LEARNING BEHAVIOR OF ANALYSIS OF HIGHER STUDIES USING DATA MINING  

The main concern of providing higher education is to provide quality education to the students and to produce technically qualified professionals. The knowledge is hidden among the educational data set and it i...

Analysis of Data Grid in Grid Computing Based On Random Search Using Gridsim 

This paper firstly summarizes and defines data grid models and the process of job scheduling, simultaneously analyzes the time and cost of job execution in the data grid. The job scheduling strategy influences the...

Download PDF file
  • EP ID EP125830
  • DOI -
  • Views 93
  • Downloads 0

How To Cite

Pallavi Galgale, Priyanka Ahire, Snehal Ingavale, Dr. R. S. Prasad (2012). Review of issues in automatic labelling of formatted document . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(10), 301-304. https://europub.co.uk/articles/-A-125830