A Review on Identifying the Main Content From Web Pages

Journal Title: International Journal of Science and Research (IJSR) - Year 2015, Vol 4, Issue 4

Abstract

A web page is a web document in which huge amount of information is available and because of rapid growth of World Wide Web there is a great advantage to anyone, the user can easily access the web pages from any place through the internet. In the web page contains noisy information like menus, footers, unnecessary links, logos, etc and the main content. Most of the users are interested in only main content .But the main problem with the extraction process is to greater performance impact on web summarization; question answering system, information retrieval application because of the web page is collection of noisy and main content. So we propose an extraction process for identifying main content from web pages. In the extraction process consist of an automatic extraction techniques and hand crafted rules. In the automatic extraction techniques process the first step is to the web page is segmented into web page block and the second step is to differentiate main content from irrelevant or noisy content. In the hand crafted rule process extracts the main content from web pages by using rules which are already generated.

Authors and Affiliations

Keywords

Related Articles

T S Eliot and the Concept of Literary Tradition and the Importance of Allusions

T S Eliot is of the opinion tradition is the historical sense and not the handing down, or following the ways of the ancient blindly. It cannot be inherited. It can only be achieved with great conscious efforts. An artis...

Demonstrating Chaos on Financial Markets through a Discrete Logistic Price Dynamics

The paper highlights the role that speculation plays in making stock price fluctuation chaotic. The positive feedback produce by speculative behavior determines the general dynamics of stock prices. The price dynamics is...

Active Learning of Control Systems

Teaching control systems to the under graduate engineering students is a tough job without active learning. The innovation in teaching control systems is simple but quiet interesting. A role play was conducted to make th...

Pervasive Monitoring of M-Health Care Using Android

‘Pervasive Healthcare Monitoring System (PHMS)’ is one of the important pervasive computing applications aimed at providing healthcare services to all the people through mobile communication devices. Pervasive computing...

An Energy Aware Routing Protocol with Sleep Scheduling for WSN

Wireless Sensor Networks (WSNs) consist of a large number of small and low cost sensor nodes powered by small batteries and equipped with various sensing devices. Usually, for many applications, once a WSN is deployed, p...

Download PDF file
  • EP ID EP367551
  • DOI -
  • Views 108
  • Downloads 0

How To Cite

(2015). A Review on Identifying the Main Content From Web Pages. International Journal of Science and Research (IJSR), 4(4), -. https://europub.co.uk/articles/-A-367551