Reliable Algorithm for Extracting Web Data

Apply

Reliable Algorithm for Extracting Web Data

Journal Title: International Journal of Research in Computer and Communication Technology - Year 2013, Vol 2, Issue 1

Abstract

Web usage mining is a process of extracting useful information from server logs i.e. users history. Web usage mining is the process of finding out what users are looking for on the Internet. Some users might be looking at only textual data, whereas some others might be interested in multimedia data. One would retrieve the data by copying it and pasting it to the relevant document. But this is tedious and time-consuming as well as difficult when the data to be retrieved is plenty. Extracting structured data from a web page is challenging problem due to complicated structured pages. In previous they will use web page programming language dependent, the main problem is to analyze the html source code. In previous they will consider the scripts such as java script and cascade styles in the html files. It makes for difficulty for existing solutions to infer the regularity of the structure of WebPages only by analyzing the tag structures. To overcome this problem we are using a new technique called VIPS algorithm (vision based page segmentation) i.e. independent language. This approach primary utilizes the visual features on the webpage to implement web data extraction.

Authors and Affiliations

R. V. V Satyanarayana, Mortha Chinnarao, sudhir varma raju, B. N Jagadesh

Keywords

Web mining Web data extraction

EP ID EP27537
DOI -
Views 357
Downloads 6