Reliable Algorithm for Extracting Web Data

Abstract

Web usage mining is a process of extracting useful information from server logs i.e. users history. Web usage mining is the process of finding out what users are looking for on the Internet. Some users might be looking at only textual data, whereas some others might be interested in multimedia data. One would retrieve the data by copying it and pasting it to the relevant document. But this is tedious and time-consuming as well as difficult when the data to be retrieved is plenty. Extracting structured data from a web page is challenging problem due to complicated structured pages. In previous they will use web page programming language dependent, the main problem is to analyze the html source code. In previous they will consider the scripts such as java script and cascade styles in the html files. It makes for difficulty for existing solutions to infer the regularity of the structure of WebPages only by analyzing the tag structures. To overcome this problem we are using a new technique called VIPS algorithm (vision based page segmentation) i.e. independent language. This approach primary utilizes the visual features on the webpage to implement web data extraction.

Authors and Affiliations

R. V. V Satyanarayana, Mortha Chinnarao, sudhir varma raju, B. N Jagadesh

Keywords

Related Articles

Requirement Elicitation in Web Applications: Challenges

Requirement elicitation is a vital activity in the process of requirement development and it discovers the requirements of end users. The successfulness of this process mainly depends on identifying the appropriate s...

Secret Sharing of Convergent Keys to Third Party Concept of Dekey

Data de-duplication is a method for eliminating redundant data copies and has been widely used in cloud storage provider to reduce the storage space and bandwidth. The arising challenge is to perform secure de-duplic...

Modeling A New Architecture Of Area Delay Efficient 2-D Fir Filter Using VHDL

This paper presented memory footprint and combinational complexity for two - dimensional finite impulse response (FIR) filter to get the systematic design strategy to obtain areadelay-power-efficient architectures. Ba...

Internet of things (IoT), Gives life to Non living

IoT alludes to the organized interconnection of regular items in practice, which are frequently furnished with pervasive knowledge [1]. IoT will expand the pervasiveness of the Internet by incorporating each item for...

PERFECTIONS AND PSYCHIATRING USER PROFILE IN WEB SITES USING WEB USAGE MINING & CLUSTERING SESSIONS

The web site under study is part of a nonprofit organization that does not "sell" any products, it was crucial to understand "who" the users Ire, "what" they looked at, and "how their interests changed with time," al...

Download PDF file
  • EP ID EP27537
  • DOI -
  • Views 344
  • Downloads 6

How To Cite

R. V. V Satyanarayana, Mortha Chinnarao, sudhir varma raju, B. N Jagadesh (2013). Reliable Algorithm for Extracting Web Data. International Journal of Research in Computer and Communication Technology, 2(1), -. https://europub.co.uk/articles/-A-27537