Extraction of Core Contents from Web Pages

Journal Title: INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY - Year 2014, Vol 8, Issue 9

Abstract

The information available on web pages mostly contains semi-structured text documents which are represented either in XML, or HTML, or XHTML format that lacks formatted document structure. The document does not discriminate between the text and the schema that represent the text. Also the amount of structure used to represent the text depends on the purpose and size of text document. No semantic is applied to semi-structured documents. This requires extracting core contents of text document to analyse words or sentences to generate useful knowledge. This paper discusses several techniques and approaches useful for extracting core content from semi-structured text documents and their merits and demerits

Authors and Affiliations

Sandeep Sirsat

Keywords

Related Articles

Interpreting Low Resolution MRI Images Using Polynomial Based Interpolation

In medical imaging, image interpolation is a key aspect. Some interpolation approaches are proposed to overcome the problem of low resolution in medical imaging. MRI is an invaluable modality in the medical field. Partic...

 Study of Various EBG Based Micro strip Filter Structures

 In this paper, a planar EBG based microstrip filter structure is formed by etching circles in the ground plane and using a modulated microstrip line. These planar EBG microstrip structure provides a wide stopband...

 Red Tacton - A Human Area Networking Technology

 Now a day’s electronic devices become smaller and lower in power Requirements, and they are less expensive. we have begun to adorn our bodies with personal information and communication appliances. Such devices inc...

Experimental Investigation of Thermal and Mechanical Properties of Palmyra Fiber Reinforced Polyster Composites With and Without Chemical Treatment and Also Addition of Chalk Powder

The interest in natural fiber-reinforced polymer composite materials is rapidly growing both in terms of their industrial applications and fundamental research. The natural fiber composites are more  environmentally...

 ARM7 BASED SMART CAR SECURITY SYSTEM.

 main aim of this project is to offer an advance security system in CAR, which consists of a face detection subsystem, a GPS module, a GSM module and a control platform. The face detection subsystem can detect fac...

Download PDF file
  • EP ID EP115960
  • DOI -
  • Views 116
  • Downloads 0

How To Cite

Sandeep Sirsat (2014). Extraction of Core Contents from Web Pages. INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY, 8(9), 484-489. https://europub.co.uk/articles/-A-115960