A Methodology for Template Extraction from Heterogeneous Web Pages
Journal Title: Indian Journal of Computer Science and Engineering - Year 2012, Vol 3, Issue 3
Abstract
The World Wide Web is a vast and most useful collection of information. To achieve high productivity in publishing the web pages are automatically evaluated using common templates with contents. The templates are considered harmful because they compromise the relevance judgement of many web information retrieval and web mining methods such as clustering and classification and badly impact the performance and resources of tools that processes the web pages. Thus, the template detection techniques have received a lot of attention to improve the performance of search engines, clustering and classification of web documents. In this paper, we are presenting the approach to detect and extract the templates from heterogeneous web documents and cluster them into different group. The pages belong to each group should possess the same structure .This saves the time to find out best templates from a large number of web document and also saves the memory which is required to find out the best template structure.
Authors and Affiliations
Vidya Kadam , Prakash. R. Devale
Evaluation of the Signal to Noise in Different Radiographic Methods and in Standard Digitizer
Radiography is one of the methods to find volumetric defects in Non Destructive Technique (NDT). Radiographic film is digitized for the further assessment of the defects. The successful use of radiography depends on the...
LEAF COLOR, AREA AND EDGE FEATURES BASED APPROACH FOR IDENTIFICATION OF INDIAN MEDICINAL PLANTS
This paper presents a method for identification of medicinal plants based on some important features extracted from its leaf images. Medicinal plants are the essential aspects of ayurvedic system of medicine. The leaf ex...
A SURVEY OF SKYLINE PROCESSING IN VARIOUS ENVIRONMENT
Skyline queries which received an interesting attention to the database and data mining field and its main advantage is it is used for multi-criteria decision making. Advanced query operators, such as skyline queries are...
Comparative study on classification power of the attributes and reducts
In rough set theory, reduct are those attributes which are important and are able to represent the whole range of attributes. They show up the important features of database. In the present study of educational data mini...
RELIABLE MULTI PATH ROUTING FOR 802.16 WIRELESS MESH NETWORKS
The effective technique to avoid congestion and losses in networks is by multipath routing. Multipath routing constructs multiple paths for a source and destination and provides fault-tolerance and reliability. In IEEE 8...