Transforming Digital Unstructured and Semi-structured Data into Structured Data with the Aid of IE and KDT

Abstract

Data growth has seen an exponential acceleration with the advent of computer and network, which have imparted the digital form to data. Data can be classified into three categories: Unstructured data, Semi-structured data and structured data. Text Mining concerns extraction of relevant information, knowledge or patterns from sources that are in Unstructured or Semi-structured form. This project entitled “Transforming Digital Unstructured and Semi-structured Data into Structured Data with the Aid of IE and KDT” demonstrates a framework for text mining using a learned information extraction system aided with KDT (Knowledge Discovery from Text) principles. The functionality of this project is concentrated over the integrated result of IE (Information Extraction) module, KDT (Knowledge Discovery form Text) module and Standard Protocols module. Pre-processing is employed for transforming unstructured data or Semi-structured data such as HTML documents, text documents, and documents with .doc, .docx or .pdf extensions into a feasible format of data which is then mined for interesting relationships. Standard Protocols are defined for discovery of additional information’s from input sources. For Example, consider if information extraction system has managed to extract skills like “HTML” and “DHTML” from a computer job posting but could not find “XML” in the document, in such cases relationships can be mined through predefined derivations which are framed in the standard protocols module. In addition, rules mined from the database extracted from a corpus of texts are used to predict additional information that could be extracted from future input documents, thereby improving the recall of the underlying extraction system. Results are presented by applying these techniques to a corpus of computer job announcement from an internet news group.

Authors and Affiliations

Prakhyath Rai, Vijaya Murari T

Keywords

Related Articles

Infiltrate Testing Tool for Web Services Security

For distributed computing solutions Web Services are widely used. Web Services technology is used to integrate existing homogenous or heterogeneous enterprise applications. It can also be used to build inter-operable...

Perceiving TheMalevolent Sachet Fatalities

The open nature of the wireless medium leaves it vulnerable to intentional interferenceattacks, typically referred to as jamming. This intentional interference with wirelesstransmissions can be used as a launchpad for...

Reducing Latency Among Client-Server for Distributed Interactive Applications

The interactivity of DIAs is significant for members to have pleasant communication experiences. As a rule, interactivity is considered by the duration from the time when a participant issues an operation to the time...

Secret Sharing of Convergent Keys to Third Party Concept of Dekey

Data de-duplication is a method for eliminating redundant data copies and has been widely used in cloud storage provider to reduce the storage space and bandwidth. The arising challenge is to perform secure de-duplic...

Novel Implementation of Low Power Test Patterns for In Situ Test

Test vector generation, its application to CUT and its response analysis are the tasks done by the In Situ Test. A new and efficient approach for the Generation of all one bit changing random input patterns for in sit...

Download PDF file
  • EP ID EP28185
  • DOI -
  • Views 240
  • Downloads 2

How To Cite

Prakhyath Rai, Vijaya Murari T (2015). Transforming Digital Unstructured and Semi-structured Data into Structured Data with the Aid of IE and KDT. International Journal of Research in Computer and Communication Technology, 4(4), -. https://europub.co.uk/articles/-A-28185