A Review of Various Techniques of Web Content Mining For HTML and XML Contents

Abstract

World Wide Web is the largest source of information. Most of the data on the web is dynamic and is in unstructured form. It is becoming difficult to get the relevant data from the web. Data Mining is the field of computer science which is used to extract knowledge from very large amount of data. Web mining is the application of data mining, which implements various techniques of data mining to get the efficient knowledge from the web data. This paper presents an overview of various techniques that has been used for web content mining including images, audio, video and semi-structured contents like HTML and XML. Since HTML has many limitations like limited tags, not case sensitive and designed to display data only, Web developers has started to develop Web pages on emerging Web Technologies like XML, Flash etc. XML was designed to describe data and to focus on what the data is. XML also plays the role of a meta- language and allows document authors to create customized markup language for limitless different types of documents, making it a standard data format for online data exchange.

Authors and Affiliations

Rupinder Kaur, Kamaljit Kaur

Keywords

Related Articles

Survey on Different Smoke Detection Techniques Using Image Processing

The most significant parts of protective and monitoring systems are the fire detection systems. Fire detection is very important for the safety of the people. The main causes of disasters are the failure in fire dete...

Design Of Ternary Logic Gates Using CNTFET

This paper presents a novel design of ternary logic gates like STI,PTI,NTI,NAND and NOR using carbon nanotube field effect transistors. Ternary logic is a promising alternative to the conventional binary logic design...

A Spatial Domain Approach for Digital Image Watermarking Exploiting Colour Spaces

In the era of digital technology, there is an imperative need to protect the ownership of digital information. Watermarking is one of the processes of embedding information in a cover object so as to prove the ownersh...

Secure Authentication Schemes

There are different authentication methods to provide security. The most commonly used are textual passwords, but are commonly susceptible to brute -force attacks. Recognition and Recall-based techniques on graphical...

Integrated Data Mining Approach for Security Alert System

The need for automatic detection of deceptive Emails is increasing due to the rapid usage of Email communication in the Internet world. The proposed “Security Alert System” provides a way to identify the future terro...

Download PDF file
  • EP ID EP27937
  • DOI -
  • Views 236
  • Downloads 0

How To Cite

Rupinder Kaur, Kamaljit Kaur (2014). A Review of Various Techniques of Web Content Mining For HTML and XML Contents. International Journal of Research in Computer and Communication Technology, 3(6), -. https://europub.co.uk/articles/-A-27937