A Review of Various Techniques of Web Content Mining For HTML and XML Contents

Abstract

World Wide Web is the largest source of information. Most of the data on the web is dynamic and is in unstructured form. It is becoming difficult to get the relevant data from the web. Data Mining is the field of computer science which is used to extract knowledge from very large amount of data. Web mining is the application of data mining, which implements various techniques of data mining to get the efficient knowledge from the web data. This paper presents an overview of various techniques that has been used for web content mining including images, audio, video and semi-structured contents like HTML and XML. Since HTML has many limitations like limited tags, not case sensitive and designed to display data only, Web developers has started to develop Web pages on emerging Web Technologies like XML, Flash etc. XML was designed to describe data and to focus on what the data is. XML also plays the role of a meta- language and allows document authors to create customized markup language for limitless different types of documents, making it a standard data format for online data exchange.

Authors and Affiliations

Rupinder Kaur, Kamaljit Kaur

Keywords

Related Articles

An Automatic Personalized Monitoring Service To Provide Security And Privacy Using Cam

Physiological data could then be sent to a central server which could then run a range of web medical applications on these data to return timely advice to the client. Cloud-assisted mHealth monitoring could present...

Adaptive Contention & Slot Reservation Based MAC protocol

The IEEE 802.16 standard is the basic protocol for data communication in the upstream channel in wireless sensor network. Also the reservation based medium access control MAC protocol is adopted by the IEEE 802.16 st...

Filtered Wall: An Automated System to Filter Unwanted Messages from OSN User Profiles

In recent years, Online Social Networks (OSNs) have become an important part of daily life. Users build explicit networks to represent their social relationships. Users can upload and share information related to the...

Novel Resource Allocation Algorithm for Cloud System That Supports VM-Multiplexing Technology

Formulation of a deadline-driven resource allocation problem based on the cloud environment assist with VM resource isolation technology and also suggests a novel solution with polynomial time which could minimize us...

Optimized Word Sense Disambiguation in Hindi using Genetic Algorithm

Word Sense Disambiguation (WSD) is a problem of computationally determining which “sense” of a word is activated by the use of the word in particular context. Genetic Algorithm is used to figure out the appropriate m...

Download PDF file
  • EP ID EP27937
  • DOI -
  • Views 263
  • Downloads 0

How To Cite

Rupinder Kaur, Kamaljit Kaur (2014). A Review of Various Techniques of Web Content Mining For HTML and XML Contents. International Journal of Research in Computer and Communication Technology, 3(6), -. https://europub.co.uk/articles/-A-27937