Extraction of Information from Web Page Using Content Mining Approach

Abstract

Today internet has made the life of human dependent on it. Almost everything and anything can be searched on net. The rapid growth of World Wide Web has been tremendous in recent years. With the large amount of information on the Internet, web pages have been the potential source of information retrieval and data mining technology such as commercial search engines, web mining applications. However, the web page as the main source of data consists of many parts which are not equally important. Besides the main contents, a web page also comprises of noisy parts that can degrade the performance of information retrieval applications. Thus cleaning the web pages before mining becomes critical for improving the mining results. In our work, we focuses on identifying and removing local noises in web pages to improve the performance of mining. The information contained in these non-content blocks can distract the user and also harm web mining So, it is important to separate the informative primary content blocks from non-informative blocks. So, we propose a system that remove various noise patterns from any web page. There are two steps, Web Page Segmentation and Informative Content Extraction, are needed to be carried out for Web Informative Content Extraction. We are going to analyze the web page and by using methods and algorithm we extract topic information requested by user.

Authors and Affiliations

Pranali Gatfane, Rani Tanpure, Anjali Masodkar, Vrushali Patil

Keywords

Related Articles

The PolyVernam Cipher

Cryptography encryption is an effective way to achieve the security of data. The encryption is to hide the data in a way that an attacker cannot hack the data. The main purpose of encryption is to hide the data from the...

Influence of Curing Types on Properties of Concrete Using Slag Replacement

The influence of curing types on properties of concrete along with different percentage of cement replacement is studied. Air curing and accelerated curing are compared to water curing. Four types of mixes (i.e., with d...

An Intrusion Detection System Against Multiple Blackhole Attacks In Ad-Hoc Networks Using Wireless Antnet

A MANET is a type of ad hoc network that can change locations and configure itself on the fly. Since the nodes are mobile, the network topology may change rapidly and unpredictably over time. The network is decentralize...

Dial a Ride Problem

Generally we have seen in metro cities transport problem, even we have multiple options like city buses, metro trains, cab facilities, taxies etc. Every day cab and taxi organizations getting thousands of requests, due...

A Novel Semi-Blind Watermark Extraction Algorithm

in this paper, a novel algorithm for semi-blind watermark extraction is proposed. Though watermark embedding is done using a non-blind watermarking scheme, the detection method proposed extracts the watermark without th...

Download PDF file
  • EP ID EP19491
  • DOI -
  • Views 266
  • Downloads 5

How To Cite

Pranali Gatfane, Rani Tanpure, Anjali Masodkar, Vrushali Patil (2015). Extraction of Information from Web Page Using Content Mining Approach. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 3(2), -. https://europub.co.uk/articles/-A-19491