Automatic Template Extraction using Hyper Graph Technique from Heterogeneous Web Pages
Journal Title: International Journal of Advanced Research in Computer Engineering & Technology(IJARCET) - Year 2013, Vol 2, Issue 4
Abstract
World Wide Web is the most useful source of information. In order to achieve high productivity of publishing, the web pages in many websites are automatically populated by using the common templates with contents. The templates provide readers easy access to the contents guided by consistent structures. However, for machines, the templates are considered harmful since they degrade the accuracy and performance of web applications due to irrelevant terms in templates. Thus, template detection techniques have received a lot of attention recently to improve the performance of search engines, clustering, and classification of web documents. In this paper, we present novel algorithms for extracting templates from a large number of web documents which are generated from heterogeneous templates. We cluster the web documents based on the similarity of underlying template structures in the documents so that the template for each cluster is extracted simultaneously. We develop a novel goodness measure with its fast approximation for clustering and provide comprehensive analysis of our algorithm. Our experimental results with real-life data sets confirm the effectiveness and robustness of our algorithm compared to the state of the art for template detection algorithms
Authors and Affiliations
D. Kanagalatchumy , Dr. S. Pushpa
BER AND SIMULATION OF OFDM MODULATOR AND DEMODULATOR FOR WIRELESS BROADBAND APPLICATIONS
With the rapid growth of digital wireless communication in recent years, the need for high speed mobile data transmission has increased. New modulation techniques are being implemented to keep up with the desired more...
Privacy Requirement Engineering Based on Modified Evidence Combination Approach
A major challenge in the field of software engineering is to make users trust the software that they use in their every day professional or recreational activities. Trusting software depends on various elements,...
GRID COMPUTING – AN ALTERNATIVE TO HPC
Grid Computing delivers on the potential in the growth and abundance of network connected systems and bandwidth: computation, collaboration and communication over the Advanced Web. At the heart of Grid Computing...
Intelligent System for detecting, Modeling, Classification of human behavior using image processing, machine vision and OpenCV
Surveillance Cameras has proven to be a key factor in enhancing the public security in many countries around the world . In spite of advancements in image processing and machine vision techniques very less is app...
Preserving Privacy Using Data Perturbation in Data Stream
Data stream can be conceived as a continuous and changing sequence of data that continuously arrive at a system to store or process. Examples of data streams include computer network traffic, phone conversations, web sea...