Hidden Web Data Extraction Using Dynamic Rule Generation

Journal Title: International Journal on Computer Science and Engineering - Year 2011, Vol 3, Issue 8

Abstract

World Wide Web is a global information medium of interlinked hypertext documents accessed via computers connected to the internet. Most of the users rely on traditional search engines to search the information on the web. These search engines deal with the Surface Web which is a set of Web pages directly accessible through hyperlinks and ignores a large part of the Web called Hidden Web which is hidden to present-day search engines. It lies behind search forms and this part of the web containing an almost endless amount of sources providing high quality information stored in specialized databases can be found in the depths of the WWW. A large amount of this Hidden web is structured i.e Hidden websites contain the information in the form of lists and tables. However visiting dozens of these sites and analyzing the results is very much time consuming task for user. Hence, it is desirable to build a prototype which will minimize user’s effort and give him high quality information in integrated form. This paper proposes a novel method that extracts the data records from the lists and tables of various hidden web sites of same domain using dynamic rule generation and forms a repository which is used for later searching. By searching the data from this repository, user will find the desired data at one place. It reduces the user’s effort to look at various result pages of different hidden websites.

Authors and Affiliations

Anuradha , A. K Sharma

Keywords

Related Articles

Design and Analysis of Fuzzy Metagraph Based Data Structures

Fuzzy metagraph is an emerging technique used in the design of many information processing systems like transaction processing systems, decision support systems, and workflow systems. Very often, even a carefully chosen...

On Rough Set Modelling for Data Mining

Many problems in real world can be explained in natural languages. Rough Set Theory is defined with many operations, rules extended from classical set theory and is widely used to model systems related to data mining. Th...

On the Design of Simulation Package for GPRS Network

The main objective of this paper is to describe a new designed software tool to simulate the GPRS network. The proposed simulation technique shows how to connect the components together so as to achieve the correct netwo...

Facial Expression Recognition

Facial expression analysis is rapidly becoming an area of ntense interest in computer science and human-computer interaction design communities. The most expressive way humans display emotions is through facial expressi...

A Fast Algorithm for Mining Multilevel Association Rule Based on Boolean Matrix

In this paper an algorithm is proposed for mining multilevel ssociation rules. A Boolean Matrix based approach has been mployed to discover frequent itemsets, the item forming a ule ome from different levels. It adop...

Download PDF file
  • EP ID EP92166
  • DOI -
  • Views 138
  • Downloads 0

How To Cite

Anuradha, A. K Sharma (2011). Hidden Web Data Extraction Using Dynamic Rule Generation. International Journal on Computer Science and Engineering, 3(8), 3047-3058. https://europub.co.uk/articles/-A-92166