A Novel Architecture of Agent based Crawling for OAI Resources

Journal Title: International Journal on Computer Science and Engineering - Year 2010, Vol 2, Issue 4

Abstract

Nowadays, most of the search engines are competing to index as much of the Surface Web as possible with leaving a lurch at the OAI content (pdf documents), which holds a huge amount of information than surface web. In this paper, a novel framework for OAI-PMH based Crawler is being proposed that uses agents to extract the metadata about the OAI resources nd store them in a repository which is later on queried hrough he OAI-PMH layer to generate the XML pages ontaining the metadata. These pages are further added to the search gines repository for indexing that makes in turn increases the relevancy of Search Engine. Agents are being used to rallelize the whole process so that metadata extraction from multiple resources can be carried out simultaneously.

Authors and Affiliations

Shruti Sharma , J. P. Gupta , A. K. Sharma

Keywords

Related Articles

Algorithm for Efficient Multilevel Association Rule Mining

over the years, a variety of algorithms for finding frequent item sets in very large transaction databases have been developed. The problems of finding frequent item sets are basic in multi level association rule mining,...

Survey on Feature Selection in Document Clustering

Text mining is to research technologies to discover useful knowledge from enormous collections of documents, and to develop a system to provide knowledge and to support in decision making. Basically cluster means a group...

An Approach to Automatic Generation of Test Cases Based on Use Cases in the Requirements Phase

The main aim of this paper is to generate test cases from the use cases. In the real-time scenario we have to face several issues like inaccuracy, ambiguity, and incompleteness in requirements this is because the require...

Segmentation Based Approach to Dynamic Page Construction from Search Engine Results

The results rendered by the search engines are mostly a linear snippet list. With the prolific increase in the dynamism of web pages there is a need for enhanced result lists from search engines in order to cope-up with...

Download PDF file
  • EP ID EP91883
  • DOI -
  • Views 104
  • Downloads 0

How To Cite

Shruti Sharma, J. P. Gupta, A. K. Sharma (2010). A Novel Architecture of Agent based Crawling for OAI Resources. International Journal on Computer Science and Engineering, 2(4), 1190-1195. https://europub.co.uk/articles/-A-91883