Semi-Structured Data Structured Data Conversion Using Data Mining Methods
Journal Title: International journal of Emerging Trends in Science and Technology - Year 2017, Vol 4, Issue 10
Abstract
Emerging technologies of semi-structured data have attracted a wide attention like networks, e-commerce, information retrieval and databases. In these applications, the data are modeled not as static collections but as transient data streams, where the data source is an unbounded stream of individual data items. It is becoming increasingly popular to send heterogeneous and ill-structured data through networks. Since traditional database technologies are not directly applicable to such data streams, it is important to study efficient information extraction methods for semi-structured data. Hence there has been increasing demand for automatic methods for extracting useful information, particularly, for discovering rules or patterns from large collection of semi-structured data, namely, semi-structured data mining. We introduce a class of simple combinatorial patterns over texts such as proximity phrase association patterns and ordered and unordered tree patterns modeling unstructured texts and semi-structured data on the Web. In addition with, we consider the problem of finding the patterns that optimize a given statistical measure within the whole class of patterns in a large collection of unstructured texts. For these classes of patterns, we develop fast and robust text mining algorithms based on techniques in computational geometry, string matching, and combinatorial optimization. We successfully implemented the developed text and semi-structured mining algorithms with experiments on interactive document browsing in a large text database, keyword and common structure discovery from Web.
Authors and Affiliations
B. Suchitra
Content Based Image Retrieval System Using Relevance Feedback
Content based image retrieval (CBIR) is the basis of image retrieval systems. Image retrieval based on image content has become an interesting topic in the field of image processing. To be more profitable, relevance feed...
An Incrementally Deployable Data Centric Architechture Using Particle Swarm Optimization Algorithm
Today’s Internet users are majorly interested in accessing data only from server i.e. interested in data networks instead of host networks. But the current Internet architectures are host centric network and it is not pr...
Similar Characteristics of Fibrillar form of β-Amyloid Peptide Fractions from Mice Brain affected by Systemic Amyloidosis
Enhanced expression of amyloid β-peptide (Aβ) and deposition is the main causative factor in Alzheimer’s disease (AD). Factors that lead to the genesis of accumulation and toxicity of Aβs are yet to be identified. While...
Design and Analysis of Optical Communication for Underwater Applications
This paper presents design and implementation of underwater optical communication system . The system consists of transmitter, water channel, and receiver. The laser diode used as a transmitter at 530 – 540 nm wavelength...
The Application of Data Mining In Online Bookstore
With the rapid development of Internet technology in recent years, Electronic Commerce has been an inevitable product of the economy, the science and the technology. This paper takes an online bookstore platform as a bac...