A Review: Hadoop Storage and Clustering Algorithms
Journal Title: IOSR Journals (IOSR Journal of Computer Engineering) - Year 2016, Vol 18, Issue 1
Abstract
Abstract : In the last few years there has been voluminous increase in the storing and processing of data, which require convincing speed and also requirement of storage space. Big data is a defined as large, diverseand complex data sets which has issues of storage, analysis and visualization for processing results. Four characteristics of Big data which are–Volume, Value, Variety and Velocity makes it difficult for traditional systems to process the big data. Apache Hadoop is an auspicious software framework that develops applications that process huge amounts of data in parallel with large clusters of commodity hardware in a fault-tolerant and veracious manner. Various performance metrics such as reliability, fault tolerance, accuracy, confidentiality and security are improved as with the use of Hadoop. Hadoop MapReduce is an effective Computation Model for processing large data on distributed data clusters such as Clouds. We first introduce the general idea of big data and then review related technologies, such as could computing and Hadoop. Various clustering techniques are also analyzed based on parameters like numbers of clusters, size of clusters, type of dataset and noise.
Authors and Affiliations
Latika Kakkar , Gaurav Mehta
Software Defined Networking (SDN): A Revolution in Computer Network
SDN creates a dynamic and flexible network architecture that can change as the business requirements change. The growth of the SDN market and cloud computing are very much connected. As the applications cha...
A Review onImage Mining Techniques and its application on asoftware BOND
Abstract: Image processing is one of the most researched areas in computer science and it finds numerousapplications in various fields like, medical research and diagnosis, geological research, crime investigation,and so...
Task allocation model for Balance utilization of Availableresource in Multiprocessor Environment
Abstract: Distributed computing systems are of current interest due to the advancement of microprocessortechnology and computer network. The prime function of effective utilization of distributed system is accuratelymapp...
Clustering and Classification of Cancer Data Using Soft Computing Technique
Clustering and classification of cancer data has been used with success in field of medical side. In this paper the two algorithm K-means and fuzzy C-means proposed for the comparison and find the accuracy of &nb...
Model of Computation-Turing Machine
: In theoretical computer science and mathematics , the theory of computation is the branch that deals with how efficiently problems can be solved on a model of computation, using an algorithm. The field is &nbs...