Focused Crawling System based on Improved LSI

Journal Title: International Journal of Science and Research (IJSR) - Year 2013, Vol 2, Issue 9

Abstract

In this research work we have developed a semi-deterministic algorithm and a scoring system that takes advantage of the Latent Semantic indexing scoring system for crawling web pages that belong to particular domain or is specific to the topic .The proposed algorithm calculates a preference factor in addition to the LSI score to determine which web page needs to preferred for crawling by the multi threaded crawler application, by doing this we were able to produce a retrieval system that has high recall and precision values as it builds a queue which is specific to a particular domain/topic which would not have been possible in Breath first and only LSI based information retrieval systems.

Authors and Affiliations

Keywords

Related Articles

Plasma Retinol and Malondialdehyde Levels among Hemodialysis Patients

Background: Increased oxidative stress is a well-known phenomenon in dialysis patients. Oxidative stress is viewed as a disturbance in the balance between oxidant production and antioxidant defense. Retinol,the major cir...

An Introduction to 3D User Interface for Operating System

There has been considerable advancements, growth and development in the 3D displays/screens/monitors or 3D capable displays/screens/monitors. This emergence has caused a rise to many 3D - movies, games and virtual enviro...

Application of Aquifer Test Software in Calculating Hydrogeological Parameters according to the Data of Pumping Test

According to the site catalog data of pumping, the hydraulic conductivity in Hua county water source was determined by using the Aquifer Test 4.2 software and on the basis of imitation of Theis formula, Neuman model and...

Maximization of the Fishermen's Profits Exploiting a Fish Population in Several Fishery Zones

In this paper, we make a mathematical study of a bio-economic model of fishing for multi-site, exploiting by several fishermen, except one of them which is defined as not exploitable free fishing zone. This mathematical...

User Ranking Based Social Network Platform for Cloud Environment

Cloud computing with the resource allocation can be as merely as possible to the extent of the document that has been taken in the lieu of the property been designed to allocate the resources which are as lucid to the cu...

Download PDF file
  • EP ID EP337547
  • DOI -
  • Views 46
  • Downloads 0

How To Cite

(2013). Focused Crawling System based on Improved LSI. International Journal of Science and Research (IJSR), 2(9), -. https://europub.co.uk/articles/-A-337547