Hidden Web Data Extraction Using Dynamic Rule Generation
Journal Title: International Journal on Computer Science and Engineering - Year 2011, Vol 3, Issue 8
Abstract
World Wide Web is a global information medium of interlinked hypertext documents accessed via computers connected to the internet. Most of the users rely on traditional search engines to search the information on the web. These search engines deal with the Surface Web which is a set of Web pages directly accessible through hyperlinks and ignores a large part of the Web called Hidden Web which is hidden to present-day search engines. It lies behind search forms and this part of the web containing an almost endless amount of sources providing high quality information stored in specialized databases can be found in the depths of the WWW. A large amount of this Hidden web is structured i.e Hidden websites contain the information in the form of lists and tables. However visiting dozens of these sites and analyzing the results is very much time consuming task for user. Hence, it is desirable to build a prototype which will minimize user’s effort and give him high quality information in integrated form. This paper proposes a novel method that extracts the data records from the lists and tables of various hidden web sites of same domain using dynamic rule generation and forms a repository which is used for later searching. By searching the data from this repository, user will find the desired data at one place. It reduces the user’s effort to look at various result pages of different hidden websites.
Authors and Affiliations
Anuradha , A. K Sharma
Record Matching : Improving Performance in Classification
Duplication detection identifies the records that represent the same real-world entity. This is a vital process in data integration. Record matching refers to the task of finding entries that refer to the same entity in...
Modified OCABTR based Hierarchical Two Level Data Aggregation in Wireless Sensor Networks
In Wireless Sensor Network (WSN), it is necessary to optimize the energy of the sensor node. Each sensor node is equipped with a battery which is used for sensing and processing the data. Cluster based data aggregation a...
Realisation of Resourceful Data Mining Services Using Cloud Computing
Data security and access control are the most challenging research work going on, at present, in cloud computing. This is because of the users sending their sensitive data to the cloud providers for acquiring their servi...
An Exact Algorithm for Multi – Product Bulk Transportation Problem
The paper investigates an NP-Hard nature Problem, where several commodities are produced in several plant sites with capacity constraints, and distributed to several destination sites according to demands and transportat...
Resolving Ambiguous Entity through Context Knowledge and Fuzzy Approach
Entity extraction is considered as a fundamental step in many text mining applications such as machine translation, text summarization and text categorization. However, the major challenging issue in extracting the entit...