Semi-Structured Data Structured Data Conversion Using Data Mining Methods

Abstract

Emerging technologies of semi-structured data have attracted a wide attention like networks, e-commerce, information retrieval and databases. In these applications, the data are modeled not as static collections but as transient data streams, where the data source is an unbounded stream of individual data items. It is becoming increasingly popular to send heterogeneous and ill-structured data through networks. Since traditional database technologies are not directly applicable to such data streams, it is important to study efficient information extraction methods for semi-structured data. Hence there has been increasing demand for automatic methods for extracting useful information, particularly, for discovering rules or patterns from large collection of semi-structured data, namely, semi-structured data mining. We introduce a class of simple combinatorial patterns over texts such as proximity phrase association patterns and ordered and unordered tree patterns modeling unstructured texts and semi-structured data on the Web. In addition with, we consider the problem of finding the patterns that optimize a given statistical measure within the whole class of patterns in a large collection of unstructured texts. For these classes of patterns, we develop fast and robust text mining algorithms based on techniques in computational geometry, string matching, and combinatorial optimization. We successfully implemented the developed text and semi-structured mining algorithms with experiments on interactive document browsing in a large text database, keyword and common structure discovery from Web.

Authors and Affiliations

B. Suchitra

Keywords

Related Articles

Effect of Monopole field on the Gravitational Collapse of Husain Space-Time

We study the effect of the monopole field on the occurrence of the naked singularities arising in Husain space-time. For an appropriate choice of the arbitrary functions, the outgoing radial null geodesics, emanating fro...

Cascade Fuzzy Self Adaptive PID Controller for Inverse Response of Boiler Drum Level

In this paper, we are presenting new method with Cascade Fuzzy Self Adaptive PID Controller for Inverse response of the boiler drum water level. Conventional PID control System cannot reach a satisfactory result in Boile...

Mothers and Future Generation

Mothers are special caregivers of families in Rwandan society, but not only Rwanda but also the whole world, mothers bear babies nine months during pregnancy, they take care of babies during breast feeding and some lose...

Junk Food Survey Report

Background and Aim: Junk foods are rich in calories, salt and fats. Excess consumption of junk foods would lead rise to wide variety of health disorders. The aim of the present study was to know about junk food eating ha...

Water Quality and Plankton Composition of Amblypharyngodon mola Monoculture Fish Pond in Bangladesh

A study was conducted to assess the water quality and plankton composition in Amblypharyngodon mola fish pond for a period of 4 months in Bangladesh. Nine earthen pond each with three treatments, viz. T1, T2 and T3 were...

Download PDF file
  • EP ID EP245606
  • DOI -
  • Views 127
  • Downloads 0

How To Cite

B. Suchitra (2017). Semi-Structured Data Structured Data Conversion Using Data Mining Methods. International journal of Emerging Trends in Science and Technology, 4(10), 6272-6278. https://europub.co.uk/articles/-A-245606