Hidden Web Data Extraction Using Dynamic Rule Generation
Journal Title: International Journal on Computer Science and Engineering - Year 2011, Vol 3, Issue 8
Abstract
World Wide Web is a global information medium of interlinked hypertext documents accessed via computers connected to the internet. Most of the users rely on traditional search engines to search the information on the web. These search engines deal with the Surface Web which is a set of Web pages directly accessible through hyperlinks and ignores a large part of the Web called Hidden Web which is hidden to present-day search engines. It lies behind search forms and this part of the web containing an almost endless amount of sources providing high quality information stored in specialized databases can be found in the depths of the WWW. A large amount of this Hidden web is structured i.e Hidden websites contain the information in the form of lists and tables. However visiting dozens of these sites and analyzing the results is very much time consuming task for user. Hence, it is desirable to build a prototype which will minimize user’s effort and give him high quality information in integrated form. This paper proposes a novel method that extracts the data records from the lists and tables of various hidden web sites of same domain using dynamic rule generation and forms a repository which is used for later searching. By searching the data from this repository, user will find the desired data at one place. It reduces the user’s effort to look at various result pages of different hidden websites.
Authors and Affiliations
Anuradha , A. K Sharma
A New Multi Fractal Dimension Method for Face Recognition with Fewer Features under Expression Variations
In this work, a new method is presented as a mingle of Principal Component Analysis (PCA) and Multi-Fractal Dimension analysis (MFD) for feature extraction. Proposed method makes use of best decision taken from both the...
An Enhanced Transmission Power Controlled MAC Protocol for Ad Hoc Networks
In mobile ad hoc networks (MANETs), every node overhears every data transmission occurring in its vicinity and thus consumes energy unnecessarily. Although lots of research has been done on energy efficiency remains it i...
Location Management Techniques to Improve QoS in Mobile Networks Using Intelligent Agent
Location Management (LM) is one of the major issues of mobile networks that should be taken into account for providing Quality of Service (QoS) and to meet the subscribers demand (satisfaction). Location management techn...
The Impact of Social Network Usage on University Students Academic Performance: A Case Study of Benue State University Makurdi, Nigeria.
Advancement in technology and the increased availability of internet enabled handheld devices have significantly increased students’ access and use of the internet. The use of social media is enhanced by the availability...
Wireless Sensor Network –A Survey
Wireless sensor networks are the networks consisting of large number of small and tiny sensor nodes. The nodes are supplied with limited power, memory and other resources and perform in-network processing. In this paper,...