Extraction of Information from Web Page Using Content Mining Approach

Abstract

Today internet has made the life of human dependent on it. Almost everything and anything can be searched on net. The rapid growth of World Wide Web has been tremendous in recent years. With the large amount of information on the Internet, web pages have been the potential source of information retrieval and data mining technology such as commercial search engines, web mining applications. However, the web page as the main source of data consists of many parts which are not equally important. Besides the main contents, a web page also comprises of noisy parts that can degrade the performance of information retrieval applications. Thus cleaning the web pages before mining becomes critical for improving the mining results. In our work, we focuses on identifying and removing local noises in web pages to improve the performance of mining. The information contained in these non-content blocks can distract the user and also harm web mining So, it is important to separate the informative primary content blocks from non-informative blocks. So, we propose a system that remove various noise patterns from any web page. There are two steps, Web Page Segmentation and Informative Content Extraction, are needed to be carried out for Web Informative Content Extraction. We are going to analyze the web page and by using methods and algorithm we extract topic information requested by user.

Authors and Affiliations

Pranali Gatfane, Rani Tanpure, Anjali Masodkar, Vrushali Patil

Keywords

Related Articles

A Study of Variance Issues of Software Maintenance

Computer software maintenance has become the ultimate burdensome, expensive and labor-intensive activity in the application establishment life cycle. Therefore for efficiently assisting product repairing, it’s vital to...

Soldier Tracking and Health Indication System with Environmental Analysis

The soldier plays a vital role in nations security. Many times the soldiers become lost or injured. So it is important to make a system which will help in such situation. This project gives the ability to track the curr...

Optimization of Cycle Time by Lean Manufacturing Techniques-Line Balancing Approach

With the importance of being competitive in today's market, many companies are adopting various methods to improve their productivity. One way to achieve this is to adopt lean manufacturing techniques. With the mind-set...

Performance Evaluation of QOS Routing in Computer Network

This paper evaluates “Optimized Link State Routing Protocol” (OLSR) routing measurement performance analysis based on different simulation parameters. We have used NS-2 simulator tools for the performance of OLSR routin...

Numerical investigation of the effects of natural convection on the melting process of phase change material in cylindrical annulus

A numerical simulation of the melting process of a phase change material in a horizontal cylindrical annulus has been studied in this paper. Numerical study has been carried out for melting of paraffin wax as phase chan...

Download PDF file
  • EP ID EP19491
  • DOI -
  • Views 271
  • Downloads 5

How To Cite

Pranali Gatfane, Rani Tanpure, Anjali Masodkar, Vrushali Patil (2015). Extraction of Information from Web Page Using Content Mining Approach. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 3(2), -. https://europub.co.uk/articles/-A-19491