Data Integration in Big Data Environment 

Journal Title: Bonfring International Journal of Data Mining - Year 2015, Vol 5, Issue 1

Abstract

 Data Integration is the process of transferring the data in source format into the destination format. Many data warehousing and data management approaches has been supported by integration tools for data migration and transportation by using Extract-Transform-Load (ETL) approach. These tools are widely fit for handling large volumes of data and not flexible to handle semi or unstructured data. To overcome these challenges in big data world, programmatically driven parallel techniques such as map-reduce models were introduced. Data Integration as a process is highly cumbersome and iterative especially to add new data sources. The process of adding these new data sources are time consuming which results in delay, loss of data and irrelevance of the data and improper utilization of useful information. Traditionally waterfall approach is used in EDW (Enterprise Data Warehouse), where one cannot move to the next phase before completing the earlier one. This approach has its merits to ensure the right data sources are picked and right data integration processes are developed to sustain the usefulness of EDW. In big data environment, the situation is completely different. Therefore the traditional approaches of integration are inefficient in handling the current situation. So people are expected to do something regarding this issue. In this paper the importance of data integration in Big Data world are identified and the open problems of Big Data Integration are outlined to proceed future research in Big Data environment.

Authors and Affiliations

B. Arputhamary, L. Arockiam

Keywords

Related Articles

Investigation of Managers' Perception about Employees' Learning Aptitude

Corporate sector is facing a cut throat competition these days. This competitive environment gives very less time to managers and supervisors to train their employees and make them learnt to meet their job requirements....

Recursive Backtracking for Solving 9*9 Sudoku Puzzle 

Nowadays Sudoku is a very popular game throughout the world and it appears in different medias, including websites, newspapers and books. There are numerous methods or algorithms to find Sudoku solutions and Sudoku gener...

A New Algorithm for Model Order Reduction of Interval Systems 

Mixed method of interval systems is a combination of classical reduction methods and stability preserving methods of interval systems. This paper proposed a new method for model order reduction of systems with uncertain...

Crop Advisor: A Software Tool for Forecasting Paddy Yield

The highly erratic rainfall and associated climatic parameters in India, have greater influence on the performance of cropping systems and are adversely affecting the crop yields. Forecasting of crop yields from the clim...

A Modification on Linear Systematic Sampling for Odd Sample Size

The present paper deals with a modification on the selection of linear systematic sample of odd size. Consequently the proposed method is called modified linear systematic sampling. The performances of the modified linea...

Download PDF file
  • EP ID EP127121
  • DOI 10.9756/BIJDM.8001
  • Views 130
  • Downloads 0

How To Cite

B. Arputhamary, L. Arockiam (2015). Data Integration in Big Data Environment . Bonfring International Journal of Data Mining, 5(1), 1-5. https://europub.co.uk/articles/-A-127121