Extraction of Information from Web Page Using Content Mining Approach
Journal Title: International Journal for Research in Applied Science and Engineering Technology (IJRASET) - Year 2015, Vol 3, Issue 2
Abstract
Today internet has made the life of human dependent on it. Almost everything and anything can be searched on net. The rapid growth of World Wide Web has been tremendous in recent years. With the large amount of information on the Internet, web pages have been the potential source of information retrieval and data mining technology such as commercial search engines, web mining applications. However, the web page as the main source of data consists of many parts which are not equally important. Besides the main contents, a web page also comprises of noisy parts that can degrade the performance of information retrieval applications. Thus cleaning the web pages before mining becomes critical for improving the mining results. In our work, we focuses on identifying and removing local noises in web pages to improve the performance of mining. The information contained in these non-content blocks can distract the user and also harm web mining So, it is important to separate the informative primary content blocks from non-informative blocks. So, we propose a system that remove various noise patterns from any web page. There are two steps, Web Page Segmentation and Informative Content Extraction, are needed to be carried out for Web Informative Content Extraction. We are going to analyze the web page and by using methods and algorithm we extract topic information requested by user.
Authors and Affiliations
Pranali Gatfane, Rani Tanpure, Anjali Masodkar, Vrushali Patil
Implementation of Lexical Analysis
A compiler translates and/or compiles a program written in a suitable source language into target language through a number of phases. It is used for determining token through code given in input and act as communicatio...
A Review of Detection & Prevention Techniques of Black & Gray Hole Attacks in MANET
Mobile ad hoc network (MANET) is a self-organizing, self-configuring wireless network consisting of mobile nodes. It does not require any centralized access point. There is no need for any fixed infrastructure for nodes...
Harmonic Reduction in Nonlinear Load using Shunt Active Filter
Active power filter is one of the best solution to reduce the harmonics level in the system. In this paper single phase shunt active filter using sliding mode control is implemented in MATLAB/Simulink. Performance of pr...
Security Information and Assurance in a Cloud Computing Environment
Cloud computing is the use of computing resources that are delivered as a service over a network. Today, cloud computing generates a lot of excitement. It is both promising and daunting. Businesses world see its potenti...
Experimental Investigation Of Heat Transfer In Heat Exchanger Using Different Geometry Of Inserts – A Review
Heat transfer enhancement techniques are used to increase the rate of heat transfer forthfor developing efficient heat transfer enhancement devices with several designs in order to enhance the turbulence, enhance the fr...