Fast and Highly Scalable Multiresolution Linear Word based Clustering in Multidimensional data

Abstract

Clustering problems are well known in database literature for their use in numerous applications. Multidimensional data always is a challenge for clustering algorithms. The Halite, fast and scalable clustering method that looks for clusters in subspaces of multidimensional data. The tree root corresponds to a hypercube embodying the full data set. The next level divides the space in a set of 2D hypercube. The resulting hypercube are divided again, generating the tree structure. Bump Hunting task refers to apply for each level of the Counting-tree one d-dimensional Laplacian mask over the respective grid to spot bumps in the respective resolution. Specifically the main contributions of Halite are: Scalability: it is linear in time and space regarding the data size and dimensionality of the clusters’ subspaces. Usability: it is deterministic, robust to noise, doesn’t take the number of clusters as an input parameter, and detects clusters in subspaces generated by original axes or by their linear combinations, including space rotation. Effectiveness: it is accurate, providing results with equal or better quality. It is achieved through word based approach Generality: it includes a soft clustering approach.

Authors and Affiliations

P. Rubi, M. Govindaraj

Keywords

Related Articles

A Literature Review on Big Data Analytics

Huge volumes of data have been available to policymakers in the digital world. Big data is a term to collections that are not always huge, but also varied and fast changing, rendering standard tools and procedures inadeq...

Syllabic Units Automatically Segmented Data for Continuous Speech Recognition

We present novel approach for constant speech processing in which the detection and recognition tasks are separated A syllable is utilized as a measure both to detection and localization. A minimal phase’s group delay ch...

Why Python is Most Famous

We now have a plethora of programming languages to meet our requirements, but the most pressing issue is how to teach programming to beginners. We recommend Python for this job in this article since it is a programming l...

Credit Card Fraud Analysis And Detection

Due to the rise and rapid growth of E-Commerce, use of credit cards for online purchases has dramatically increased and it caused an explosion in the credit card fraud. As credit card becomes the most popular mode of pay...

A Review on Traditional and Fuzzy PID Controllers

Because of its straightforwardness of activity and economical expense, industry research recommends that a conventional PID regulator is the most well-known regulator. Exemplary PID regulators have been found to be effec...

Download PDF file
  • EP ID EP749392
  • DOI -
  • Views 49
  • Downloads 0

How To Cite

P. Rubi, M. Govindaraj (2014). Fast and Highly Scalable Multiresolution Linear Word based Clustering in Multidimensional data. International Journal of Innovative Research in Computer Science and Technology, 2(3), -. https://europub.co.uk/articles/-A-749392