Crawler Using Inverted WAH Bitmap Index and Searching User Defined Document Fields
Journal Title: International Journal of P2P Network Trends and Technology(IJPTT) - Year 2012, Vol 2, Issue 3
Abstract
Crawler is a web crawler aiming to search and retrieve web pages from the World Wide Web, which are related to a specific topic. It based on some specific algorithms to select web pages relevant to some pre-defined set of topic. The main features of Crawler consist of a user interest specification module that mediates between users and search engines to identify target examples and keywords that together specify the topic of their interest, and a URL ordering strategy that combines features of several previous approaches and achieves significant improvement. It also provides a graphic user interface such that users can evaluate and visualize the crawling results that can be used as feedback to reconfigure the crawler. Such a web crawler may interact with millions of hosts over a period of weeks or months, and thus issues of robustness, flexibility, and manageability are of major importance. The crawler should retrieve the web pages of those URLs, parse the HTML files, add new URLs into its queue. The user then provides feedback and helps the baseline classifier to be progressively induced using active learning techniques. Once the classifier is in place the crawler can be started on its task of resource discovery.
Authors and Affiliations
Mr. Sanjay Kumar Singh
Software Code Clone Detection Using AST
The research which exists suggests that a considerable portion (10-15%) of the source code of large-scale computer programs is duplicate code. Detection and removal of such clones promises decreased software maintenance...
Utility Based Routing in Mobile Networks
Self Adaptive Utility based Routing SAURP is characterized by the ability of identifying potential opportunities of forwarding messages to their destinations through a novel utility function based mechanism in whic...
Security Issues and Sybil Attack in Wireless Sensor Networks
Due to broadcast nature of Wireless Sensor Networks and lack of tamper-resistant hardware, security in sensor networks is one of the major issues. Hence research is being done on many security attacks on wireless...
An Efficient Vertical Handoff Technique for TwoTier Heterogeneous Networks
Integration of cellular networks and Wireless local area networks (WLANs) will be very useful for development of fourth generation communication technologies. The integration of two or more networks would be done b...
Strategizing Power Utilization within Intelligent Tags
Power management within intelligent Tags is one of the most important issues that must be given with utmost importance as the Tags are driven by battery power and the longevity of the battery must be increased so as to r...