Challenging Issues and Similarity Measures for Web DocumentClustering

Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2015, Vol 17, Issue 1

Abstract

 Abstract: Web itself contains a large amount of documents available in electronic form. The availabledocuments are in various forms and the information in them is not in organized form. The lack of organizationof materials in the WWW motivates people to automatically manage the huge amount of information. Textminingrefers generally to the process of extracting interesting and non-trivial information and knowledgefrom unstructured text. Text mining framework contains Information Retrieval, Information Extraction,Information Mining and Interpretation. During Information Retrieval, so many web documents are retrieved.In that how we can find out similar documents among retrieved? This paper deals with the challengingissues and similarity measures for web document clustering

Authors and Affiliations

S. Mahalakshmi

Keywords

Related Articles

Development of Virtual Computing Lab Using Private Cloud

Abstract: Virtual Computing Lab (VCL) is a very effective answer for the educational institution to meet the increasing demand of physical machines, different computational laboratories and large number of users in alimi...

 A Novel Rebroadcast Technique for Reducing Routing Overhead In Mobile Ad Hoc Networks

 In mobile ad hoc networks (MANETs), the network topology changes frequently and unpredictably due to the arbitrary mobility of nodes. This feature leads to frequent path failures and route reconstructions,  ...

An Enhanced Scheme for Hiding Text in Wave Files

Abstract: Steganography is a term applied to any number of processes that embed an object into another object in order to deceive any observer or adversary. An embedding algorithm for hiding messages into wave files or a...

Microcontroller-Based Remote Temperature Monitoring System

Abstract: There is increase in death rate in hospitals due to inadequate attention to the patients, insufficient number of doctors as well as poor state of equipment make it difficult for the patients to receive proper...

Segmentation of the Blood Vessel and Optic Disc in Retinal Images Using EM Algorithm

Abstract: Diabetic retinopathy (DR), glaucoma and hypertension are eye disease which is harmful and causes pressure in eye nerve and finally blindness. With the invention of new systems and the developing of newtechnolog...

Download PDF file
  • EP ID EP163353
  • DOI -
  • Views 97
  • Downloads 0

How To Cite

S. Mahalakshmi (2015).  Challenging Issues and Similarity Measures for Web DocumentClustering. IOSR Journals (IOSR Journal of Computer Engineering), 17(1), 55-59. https://europub.co.uk/articles/-A-163353