Semi-Structured Data Structured Data Conversion Using Data Mining Methods

Abstract

Emerging technologies of semi-structured data have attracted a wide attention like networks, e-commerce, information retrieval and databases. In these applications, the data are modeled not as static collections but as transient data streams, where the data source is an unbounded stream of individual data items. It is becoming increasingly popular to send heterogeneous and ill-structured data through networks. Since traditional database technologies are not directly applicable to such data streams, it is important to study efficient information extraction methods for semi-structured data. Hence there has been increasing demand for automatic methods for extracting useful information, particularly, for discovering rules or patterns from large collection of semi-structured data, namely, semi-structured data mining. We introduce a class of simple combinatorial patterns over texts such as proximity phrase association patterns and ordered and unordered tree patterns modeling unstructured texts and semi-structured data on the Web. In addition with, we consider the problem of finding the patterns that optimize a given statistical measure within the whole class of patterns in a large collection of unstructured texts. For these classes of patterns, we develop fast and robust text mining algorithms based on techniques in computational geometry, string matching, and combinatorial optimization. We successfully implemented the developed text and semi-structured mining algorithms with experiments on interactive document browsing in a large text database, keyword and common structure discovery from Web.

Authors and Affiliations

B. Suchitra

Keywords

Related Articles

Detection and Localization of Versatile spoofing Attackers in WSN

Wireless spoofing strikes are easy to launch and can dramatically significance the efficiency of networks. Although the recognition of a node might possibly be verified by means of cryptographic authentication, typical s...

Study and Analysis of Cutting Parameters of Rubber Rollers Using Doe Technique

Rubbers include natural rubbers and synthetic rubbers. Natural rubber is a naturally occurring substance obtained from the exudations of certain tropical plants. Synthetic rubber is artificially derived from petrochemica...

Consciousness: A rapidly moving Scientific Discipline?

Is consciousness a discipline of Philosophy/Theology or Science? Since time immemorial, consciousness has baffled us, bewildered us and sometimes even mystified us. The question still remains that can consciousness be re...

Control of Power Flow in Transmission Line Using UPFC

Electrical power systems is a large interrelated network which requires a careful design to maintain the system with constant power flow operation without any limitations. Flexible to improve the system stability of a po...

Transforming Smart Healthcare through the Internet of Things (IoT)

In the Internet of Things (IoT), gadgets acquire and percentage information directly with every other and the cloud, making it feasible to accumulate, document and examine new information streams quicker and extra approp...

Download PDF file
  • EP ID EP245606
  • DOI -
  • Views 96
  • Downloads 0

How To Cite

B. Suchitra (2017). Semi-Structured Data Structured Data Conversion Using Data Mining Methods. International journal of Emerging Trends in Science and Technology, 4(10), 6272-6278. https://europub.co.uk/articles/-A-245606