CluSandra: A Framework and Algorithm for Data Stream Cluster Analysis

Abstract

The clustering or partitioning of a dataset’s records into groups of similar records is an important aspect of knowledge discovery from datasets. A considerable amount of research has been applied to the identification of clusters in very large multi-dimensional and static datasets. However, the traditional clustering and/or pattern recognition algorithms that have resulted from this research are inefficient for clustering data streams. A data stream is a dynamic dataset that is characterized by a sequence of data records that evolves over time, has extremely fast arrival rates and is unbounded. Today, the world abounds with processes that generate high-speed evolving data streams. Examples include click streams, credit card transactions and sensor networks. The data stream’s inherent characteristics present an interesting set of time and space related challenges for clustering algorithms. In particular, processing time is severely constrained and clustering algorithms must be performed in a single pass over the incoming data. This paper presents both a clustering framework and algorithm that, combined, address these challenges and allows end-users to explore and gain knowledge from evolving data streams. Our approach includes the integration of open source products that are used to control the data stream and facilitate the harnessing of knowledge from the data stream. Experimental results of testing the framework with various data streams are also discussed.

Authors and Affiliations

Jose R. Fernandez , Eman M. El-Sheikh

Keywords

Related Articles

Proposed an Adaptive Bitrate Algorithm based on Measuring Bandwidth and Video Buffer Occupancy for Providing Smoothly Video Streaming

Dynamic adaptive streaming via HTTP (DASH) has been popular disseminated over the Internet especially under the circumstances of the time varying network, which it is currently the most challenging for providing smoothly...

DDoS Attacks Classification using Numeric Attribute-based Gaussian Naive Bayes

Cyber attacks by sending large data packets that deplete computer network service resources by using multiple computers when attacking are called Distributed Denial of Service (DDoS) attacks. Total Data Packet and import...

 A Modified Feistel Cipher Involving XOR Operation and Modular Arithmetic Inverse of a Key Matrix

 In this paper, we have developed a block cipher by modifying the Feistel cipher. In this, the plaintext is taken in the form of a pair of matrices. In one of the relations of encryption the plaintext is multiplied...

The Design and Development of Spam Risk Assessment Prototype: In Silico of Danger Theory Variants

Now-a-days, data is flowing with various types of information and it is absolutely enormous and moreover, it is in unstructured form. These raw data is meaningless unless it is processed and analyzed to retrieve all the...

Enhancement in System Schedulability by Controlling Task Releases

In real-time systems fixed priority scheduling techniques are considered superior than the dynamic priority counterparts from implementation perspectives; however the dynamic priority assignments dominate the fixed prior...

Download PDF file
  • EP ID EP113861
  • DOI -
  • Views 135
  • Downloads 0

How To Cite

Jose R. Fernandez, Eman M. El-Sheikh (2011). CluSandra: A Framework and Algorithm for Data Stream Cluster Analysis. International Journal of Advanced Computer Science & Applications, 2(11), 87-99. https://europub.co.uk/articles/-A-113861