Data Integration in Big Data Environment
Journal Title: Bonfring International Journal of Data Mining - Year 2015, Vol 5, Issue 1
Abstract
Data Integration is the process of transferring the data in source format into the destination format. Many data warehousing and data management approaches has been supported by integration tools for data migration and transportation by using Extract-Transform-Load (ETL) approach. These tools are widely fit for handling large volumes of data and not flexible to handle semi or unstructured data. To overcome these challenges in big data world, programmatically driven parallel techniques such as map-reduce models were introduced. Data Integration as a process is highly cumbersome and iterative especially to add new data sources. The process of adding these new data sources are time consuming which results in delay, loss of data and irrelevance of the data and improper utilization of useful information. Traditionally waterfall approach is used in EDW (Enterprise Data Warehouse), where one cannot move to the next phase before completing the earlier one. This approach has its merits to ensure the right data sources are picked and right data integration processes are developed to sustain the usefulness of EDW. In big data environment, the situation is completely different. Therefore the traditional approaches of integration are inefficient in handling the current situation. So people are expected to do something regarding this issue. In this paper the importance of data integration in Big Data world are identified and the open problems of Big Data Integration are outlined to proceed future research in Big Data environment.
Authors and Affiliations
B. Arputhamary, L. Arockiam
Data Integration in Big Data Environment
Data Integration is the process of transferring the data in source format into the destination format. Many data warehousing and data management approaches has been supported by integration tools for data migration...
RST Approach for Efficient CARs Mining
In data mining, an association rule is a pattern that states the occurrence of two items (premises and consequences) together with certain probability. A class association rule set (CARs) is a subset of association rules...
Probabilistic Modelling of Hourly Rainfall Data for Development of Intensity-Duration-Frequency Relationships
The rainfall Intensity-Duration-Frequency (IDF) relationship is commonly required for planning and designing of various water resources projects. The IDF relationship is a mathematical relationship between the rainfall i...
Multiple Attribute Authority based Access Control and Anonymous Authentication in Decentralized Cloud
Cloud computing is emerging as a powerful architecture to perform large-scale and complex computing. It widens the information technology (IT) capability by giving on-demand admittance to work out resources for dedicated...
Money Laundering Identification Using Risk and Structural Framework Estimation
Money laundering refers to activities that disguise money receive through illegal operations and make them legitimate. It leaves serious consequence that may lead to economy corruption. One such problem consisting large...