Extraction of Core Contents from Web Pages

Journal Title: INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY - Year 2014, Vol 8, Issue 9

Abstract

The information available on web pages mostly contains semi-structured text documents which are represented either in XML, or HTML, or XHTML format that lacks formatted document structure. The document does not discriminate between the text and the schema that represent the text. Also the amount of structure used to represent the text depends on the purpose and size of text document. No semantic is applied to semi-structured documents. This requires extracting core contents of text document to analyse words or sentences to generate useful knowledge. This paper discusses several techniques and approaches useful for extracting core content from semi-structured text documents and their merits and demerits

Authors and Affiliations

Sandeep Sirsat

Keywords

Related Articles

 Intrusion Detection-Watchdog: For Secure AODV Routing Protocol in VANET

 Vehicular Ad hoc Network (VANET) needs security to implement the wireless environment and serves users with safety and comfort applications. Attackers generate different attacks in vehicular network. In this paper,...

Design Approach for Decimation Filter for ADC Application

This paper presents a kind of design method about the decimation filter design for high performance ADC application. It was implemented and validated by simulation using MATLAB tool and its complete architecture was real...

A Hybrid Modified Semantic Matching Algorithm Based on Instances Detection With Case Study on Renewable Energy

This Matching input keywords with historical or information domain is an important point in modern computations in order to find the best match information domain for specific input queries. Matching algorithms represent...

 Design & Simulation of Zigbee Transceiver System Using Matlab

 ZigBee technology was developed for special wireless networks where Bluetooth & wi-fi technologies are not showing better results. In wireless personal area networks (PAN) where we need to transmit low data r...

 Implementation and Bit Error Rate analysis of BPSK Modulation and Demodulation Technique using MATLAB

 This paper presents the theoretical background of digital modulation and evaluate the performance of BPSK system with respect to Bit error Rate and finally implement Binary Phase shift Keying modulation technique i...

Download PDF file
  • EP ID EP115960
  • DOI -
  • Views 109
  • Downloads 0

How To Cite

Sandeep Sirsat (2014). Extraction of Core Contents from Web Pages. INTERNATIONAL JOURNAL OF ENGINEERING TRENDS AND TECHNOLOGY, 8(9), 484-489. https://europub.co.uk/articles/-A-115960