Reliable Algorithm for Extracting Web Data

Abstract

Web usage mining is a process of extracting useful information from server logs i.e. users history. Web usage mining is the process of finding out what users are looking for on the Internet. Some users might be looking at only textual data, whereas some others might be interested in multimedia data. One would retrieve the data by copying it and pasting it to the relevant document. But this is tedious and time-consuming as well as difficult when the data to be retrieved is plenty. Extracting structured data from a web page is challenging problem due to complicated structured pages. In previous they will use web page programming language dependent, the main problem is to analyze the html source code. In previous they will consider the scripts such as java script and cascade styles in the html files. It makes for difficulty for existing solutions to infer the regularity of the structure of WebPages only by analyzing the tag structures. To overcome this problem we are using a new technique called VIPS algorithm (vision based page segmentation) i.e. independent language. This approach primary utilizes the visual features on the webpage to implement web data extraction.

Authors and Affiliations

R. V. V Satyanarayana, Mortha Chinnarao, sudhir varma raju, B. N Jagadesh

Keywords

Related Articles

Multiple Spoofing Identification For Network Level Security

It’s easy to spoof the wireless networks using the mac identity and the ipconfig. The techniques which have been used based on the Matching rules of signal prints for spoofing detection, RSS readings using a Gaussian...

Cluster Based Shifting Technique For Arrange Data Units Into Different Groups In Web Databases

The techniques of a clustering based shifting method make use of more affluent yet automatically obtainable features. This method is capable of handling a variety of relationships between HTML text nodes and data uni...

Effective Routing Plan For Quality Results On Linked Data Using Keyword Search

Keyword search is a natural world view for seeking connected information sources on the web. We propose to route keywords just to pertinent sources to reduce the high cost of handling keyword search queries over all...

Multiple View Point on Cluster Analysis

The clustering methods have to assume some cluster relationship among the data objects that they are applies on. Similarity between a pair of objects can be defines either explicitly or implicitly. In this paper we i...

A Systematic Methodology and Guidelines for Software Project Manager to Identify Key Stakeholders

Involving as many as possible of those who may be affected by or have an effect on any software project will lead to a better understanding, better process, greater community support as well as a more effective system...

Download PDF file
  • EP ID EP27537
  • DOI -
  • Views 336
  • Downloads 6

How To Cite

R. V. V Satyanarayana, Mortha Chinnarao, sudhir varma raju, B. N Jagadesh (2013). Reliable Algorithm for Extracting Web Data. International Journal of Research in Computer and Communication Technology, 2(1), -. https://europub.co.uk/articles/-A-27537