Transforming Digital Unstructured and Semi-structured Data into Structured Data with the Aid of IE and KDT
Journal Title: International Journal of Research in Computer and Communication Technology - Year 2015, Vol 4, Issue 4
Abstract
Data growth has seen an exponential acceleration with the advent of computer and network, which have imparted the digital form to data. Data can be classified into three categories: Unstructured data, Semi-structured data and structured data. Text Mining concerns extraction of relevant information, knowledge or patterns from sources that are in Unstructured or Semi-structured form. This project entitled “Transforming Digital Unstructured and Semi-structured Data into Structured Data with the Aid of IE and KDT” demonstrates a framework for text mining using a learned information extraction system aided with KDT (Knowledge Discovery from Text) principles. The functionality of this project is concentrated over the integrated result of IE (Information Extraction) module, KDT (Knowledge Discovery form Text) module and Standard Protocols module. Pre-processing is employed for transforming unstructured data or Semi-structured data such as HTML documents, text documents, and documents with .doc, .docx or .pdf extensions into a feasible format of data which is then mined for interesting relationships. Standard Protocols are defined for discovery of additional information’s from input sources. For Example, consider if information extraction system has managed to extract skills like “HTML” and “DHTML” from a computer job posting but could not find “XML” in the document, in such cases relationships can be mined through predefined derivations which are framed in the standard protocols module. In addition, rules mined from the database extracted from a corpus of texts are used to predict additional information that could be extracted from future input documents, thereby improving the recall of the underlying extraction system. Results are presented by applying these techniques to a corpus of computer job announcement from an internet news group.
Authors and Affiliations
Prakhyath Rai, Vijaya Murari T
A New Heuristic Approach For Hide Valuable Information Of Organizations
Organizations accumulate and analyze customer data to pick up their services. Access Control Mechanisms(ACM) is used to make sure that only authorized information is on hand to users. Onthe other hand, sensitive info...
Design Of Ternary Logic Gates Using CNTFET
This paper presents a novel design of ternary logic gates like STI,PTI,NTI,NAND and NOR using carbon nanotube field effect transistors. Ternary logic is a promising alternative to the conventional binary logic design...
A Comparative Study of Text Detection Algorithms for Natural Scenes
Text detection from image is highly needed application in current techno world. However, text detection is no longer an unsolved problem as many approaches/algorithms for it are encouraged by researchers. Algorithms...
Data Mining Approach To Analyze Virtual Museums Web Log Data
Virtual museums are part of digital libraries with large collections of multi dimensional data. Knowledge engineering tools facilitate extraction of meaningful information to support data mining features such as clas...
Delay Analysis of AODV Routing Protocol
MANET is a mobile ad-hoc network. It is an infrastructure less communication network through which we can send the information from source to destination in the form of packets using different routing protocols. Ther...