GClustering Algorithm  

Abstract

—Graph clustering poses significant challenges because of the complex structures which may be present in the underlying data. The massive size of the underlying graph makes explicit structural enumeration very difficult. Consequently, most techniques for clustering multi-dimensional data are difficult to generalize to the case of massive graphs. Recently, methods have been proposed for clustering graph data, though these methods are designed for static data, and are not applicable to the case of graph streams. Furthermore, these techniques are especially not effective for the case of massive graphs, since a huge number of distinct edges may need to be tracked simultaneously. This result in storage and computational challenges during the clustering process.The finding of clusters, well-connected components in a graph, is useful in many applications from natural function prediction to social community detection. An important insight is that many clustering applications need only the subset of best clusters, and not all clusters in the entire graph. In this paper we propose a new technique, GClustering, which probabilistically searches large, edge weighted, directed graphs for their best clusters in linear time. The algorithm is inherently parallelizable, and is able to find variable size, overlapping clusters. To increase scalability, a parameter is introduced that controls memory use. When compared with three other state-of-the art clustering techniques, GClustering algorithm achieves running time speedups of up to 70% on large scale real world datasets. In addition, the clusters returned by GClustering are consistently found to be better both in calculated score and when compared on real world benchmarks.  

Authors and Affiliations

Mr. Promod Kumar Sahu , G. Ravi teja2

Keywords

Related Articles

Dynamic Pattern Matching: Efficient Pattern Matching using Data Preprocessing with help of One time look indexing method 

There are various pattern matching algorithms which take more comparisons in finding a given pattern in the text and are static and restrictive. In order to search pattern or substring of a pattern in the text wi...

A Study on Autonomic Placement and Resource Management with Cloud Workloads  

With the advent of Cloud computing, with hosting and delivering the demanded services, enormous benefits were reaped by its users as capital expenditure on the computing resources is reduced to a very large exten...

DISCRIMINATION OF HEART RATE VARIABILITY USING DECISION TREES AND MLP NETWORKS 

The main objective of the paper is to analyze the heart rate variability (HRV) of various subjects. The ECG signals collected from the public data base is Categorized using the Classification and Regression Tree...

New techniques to enhance FPGA based system security  

Field Programmable Gate Arrays (FPGAs) are used as a primary element for various applications like aero-space, automotive, military etc which require them to operate in different types of environments. Security of...

Workload Optimization by Horizontal Aggregation in SQL for Data Mining Analysis 

— preparing a data set for analysis is generally the most time consuming task in a data mining project, requiring many complex SQL queries, joining tables, and aggregating columns. Existing SQL aggregations have li...

Download PDF file
  • EP ID EP157013
  • DOI -
  • Views 88
  • Downloads 0

How To Cite

Mr. Promod Kumar Sahu, G. Ravi teja2 (2012). GClustering Algorithm  . International Journal of Advanced Research in Computer Engineering & Technology(IJARCET), 1(7), 188-192. https://europub.co.uk/articles/-A-157013