Extraction of Core Contents from Web Pages

Journal Title: INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY - Year 2014, Vol 8, Issue 9

Abstract

The information available on web pages mostly contains semi-structured text documents which are represented either in XML, or HTML, or XHTML format that lacks formatted document structure. The document does not discriminate between the text and the schema that represent the text. Also the amount of structure used to represent the text depends on the purpose and size of text document. No semantic is applied to semi-structured documents. This requires extracting core contents of text document to analyse words or sentences to generate useful knowledge. This paper discusses several techniques and approaches useful for extracting core content from semi-structured text documents and their merits and demerits

Authors and Affiliations

Sandeep Sirsat

Keywords

Related Articles

 Reactive Power Optimization Using Differential Evolution Algorithm

 -In this Reactive power optimization is a nonlinear, multi-variable, multi-constrained programming problem, which makes the optimization process multifaceted. In this paper, based on the characteristics of reactive...

 Emotion Detection in Human Beings Using ECG Signals

 Emotion is often defined as a complex state of feeling that results in physical and psychological changes that influence thought and behavior. Emotion modeling and recognition has drawn extensive attention from dis...

 Electronically Steerable planer Phased Array Antenna

 A planar phased-array antenna has been constructed from a 15x15 square grid of z-directed monopoles with a length of 0.475 λ, element spacing of 0.29 λ, average directivity of 20.0 dBi across all scan angles, an av...

 Elimination of Harmonics Using Active Power Filter Based on DQ Reference Frame Theory

 Active power filters have the following multiple functions; harmonic filtering, damping, isolation and termination, reactive-power control for power factor correction and voltage regulation, load balancing, volta...

 A Novel Hybrid CQI Feedback Method For Throughput Improvement In 3GPP LTE Systems

Frequency Selective Scheduling (FSS) is a prominent characteristic in 3GPP LTE systems. Frequency scheduling depends on Channel Quality Indicator(CQI) feedback report by mobile station. CQI being control information must...

Download PDF file
  • EP ID EP115960
  • DOI -
  • Views 101
  • Downloads 0

How To Cite

Sandeep Sirsat (2014). Extraction of Core Contents from Web Pages. INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY, 8(9), 484-489. https://europub.co.uk/articles/-A-115960