Reliable Algorithm for Extracting Web Data
Journal Title: International Journal of Research in Computer and Communication Technology - Year 2013, Vol 2, Issue 1
Abstract
Web usage mining is a process of extracting useful information from server logs i.e. users history. Web usage mining is the process of finding out what users are looking for on the Internet. Some users might be looking at only textual data, whereas some others might be interested in multimedia data. One would retrieve the data by copying it and pasting it to the relevant document. But this is tedious and time-consuming as well as difficult when the data to be retrieved is plenty. Extracting structured data from a web page is challenging problem due to complicated structured pages. In previous they will use web page programming language dependent, the main problem is to analyze the html source code. In previous they will consider the scripts such as java script and cascade styles in the html files. It makes for difficulty for existing solutions to infer the regularity of the structure of WebPages only by analyzing the tag structures. To overcome this problem we are using a new technique called VIPS algorithm (vision based page segmentation) i.e. independent language. This approach primary utilizes the visual features on the webpage to implement web data extraction.
Authors and Affiliations
R. V. V Satyanarayana, Mortha Chinnarao, sudhir varma raju, B. N Jagadesh
Verification of Metadata by Encryption for Data Storage Security in Cloud
Cloud Computing provides the way to share distributed resources and services that belong to different organizations or sites. Since Cloud Computing share distributed resources via network in the open environment thus...
Identifying Misuse of Data In Cloud
Cloud Storage Enables Users To Store Their Data Offering strong data protection to cloud users while enabling rich applications is a challenging task. We explore a new cloud platform architecture called Data Protecti...
A Comparative Study of Text Detection Algorithms for Natural Scenes
Text detection from image is highly needed application in current techno world. However, text detection is no longer an unsolved problem as many approaches/algorithms for it are encouraged by researchers. Algorithms...
Classification Rules Using Decision Tree for Dengue Disease
Spatial data mining becomes more interesting and important as more spatial data have been accumulated in spatial databases. Spatial patterns are of great importance in many GIS applications that yield equal to associa...
Providing High Security for WSN’s Using Distributed Hash Table (DHT)
There have been two novel node clone detection protocols with diverse tradeoffs on network circumstances and performance. The first one is based on a distributed hash table (DHT) by which a completely decentralized,...