Extraction of Information from Web Page Using Content Mining Approach

Abstract

Today internet has made the life of human dependent on it. Almost everything and anything can be searched on net. The rapid growth of World Wide Web has been tremendous in recent years. With the large amount of information on the Internet, web pages have been the potential source of information retrieval and data mining technology such as commercial search engines, web mining applications. However, the web page as the main source of data consists of many parts which are not equally important. Besides the main contents, a web page also comprises of noisy parts that can degrade the performance of information retrieval applications. Thus cleaning the web pages before mining becomes critical for improving the mining results. In our work, we focuses on identifying and removing local noises in web pages to improve the performance of mining. The information contained in these non-content blocks can distract the user and also harm web mining So, it is important to separate the informative primary content blocks from non-informative blocks. So, we propose a system that remove various noise patterns from any web page. There are two steps, Web Page Segmentation and Informative Content Extraction, are needed to be carried out for Web Informative Content Extraction. We are going to analyze the web page and by using methods and algorithm we extract topic information requested by user.

Authors and Affiliations

Pranali Gatfane, Rani Tanpure, Anjali Masodkar, Vrushali Patil

Keywords

Related Articles

The State – of the – Art of Library Resource Sharing Activities of the Rizal Technological University

In the emergence and integration of information technology, it is rarely possible for a library or information center to have enough resources to fulfill the needs of its clients. What is being delivered is only a porti...

Software Effort Estimation using Satin Bowerbird Algorithm

There are various non-linear optimization problems can be effectively solved by Meta-heuristic Algorithms. The Software effort estimation is an optimization problem so it can also be solved by the Meta-heuristic algorit...

slugCancer prediction Based on Gene Expression data Through Association Rule Based classification and Fuzzy Rough Set Attribute Reduction on Information Gain Ratio

Data mining hast vast number of application in the area of medical science. This paper mainly aim to predict cancer type based on gene expression data. For attribute selection Information gain ratio on fuzz...

Silicon Wafer Technologies: Past & Future

Silicon is now best substrate material for IC technologies. This paper is review about development of wafer technologies in past and future. Also discuss about silicon crystal growth, Silicon on Insulator technologies a...

Traffic Analysis of a Twin City

Except in a few cases, buses operate in the general traffic system. Sometimes they have the benefit of bus lanes and other priorities to offset the effect of traffic congestion, but unless they are completely segregated...

Download PDF file
  • EP ID EP19491
  • DOI -
  • Views 234
  • Downloads 5

How To Cite

Pranali Gatfane, Rani Tanpure, Anjali Masodkar, Vrushali Patil (2015). Extraction of Information from Web Page Using Content Mining Approach. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 3(2), -. https://europub.co.uk/articles/-A-19491