A FAST Algorithm for High Dimensional Data using Clustering-Based Feature Subset Selection

Abstract

Feature subset clustering is a powerful technique to reduce the dimensionality of feature vectors for text classification and involves identifying a subset of the most useful features that produces compatible results as the original entire set of features. A novel approach called supervised attribute clustering algorithm is proposed to improve the accuracy and check the probability of the patterns. The FAST algorithm works in two steps. In the first step, features are divided into clusters by using graph-theoretic clustering methods. In the second step, the most representative feature that is strongly related to target classes is selected from each cluster to form a subset of features. A feature selection algorithm may be evaluated from both the efficiency and effectiveness points of view. Efficiency is related to the time required to find a subset of features while the effectiveness is related to quality of subset of features.Features in different clusters are relatively independent; the clusteringbased strategy of FAST has a high probability of producing a subset of useful and independent features. To ensure the efficiency of FAST, we adopt the efficient minimum-spanning tree clustering method.

Authors and Affiliations

Puppala Priyanka, M Swapna

Keywords

Related Articles

Intelligent Forms Processing System

The reading part of words is one of the most complex tasks in automated forms processing. The project describes an integrated real time system to read names and addresses on forms. The Name and Address Block Reader (NAB...

High Strength Self-Compacting Concrete using Fly Ash

The growing use of concrete in special architectural configurations and closely spaced reinforcing bars have made it very important to produce concrete that ensures proper filling ability, good structural performance an...

Employee Preferential Study on CRM Factors for Public and Private Sector Banks with Reference to Ahmadabad District

Customer Relationship Management (CRM) was developed as a popular tool in today’s competitive business environment. The technique enables the business firms to identify and target their most profitable customers. For ap...

A Review on Progressive Collapse Analysis

Progressive collapse of building is initiated when one or more vertical load carrying members particularly columns are seriously damaged or collapsed during any of the abnormal event. Once a column is failed the buildin...

An experimental study of heat transfer intensification in a channel having corrugated plate by using twisted tape as an obstacle

At present, the technology of the twisted-tape inserts is widely used in various industries. Twisted tapes are inserted in a channel which is a passive method for enhancing the heat transfer. Swirls are generated in the...

Download PDF file
  • EP ID EP19027
  • DOI -
  • Views 290
  • Downloads 9

How To Cite

Puppala Priyanka, M Swapna (2014). A FAST Algorithm for High Dimensional Data using Clustering-Based Feature Subset Selection. International Journal for Research in Applied Science and Engineering Technology (IJRASET), 2(11), -. https://europub.co.uk/articles/-A-19027