Optimizing Storage Space for Higher-Dimensional Data Using Feature Subset Selection Approach

Donia Augustine

Abstract


As applications producing data of higher dimensions has increased tremendously, clustering of data under reduced memory became a necessity. Feature selection is a typical approach to cluster higher dimensional data. It involves identifying a subset of most relevant features from the entire set of features. Our approach suggests a method to efficiently cluster higher dimensional data under reduced memory. An N-dimensional feature selection algorithm, NDFS is used for identifying the subset of relevant features. The concept of feature selection helps in removing the irrelevant and redundant features from each cluster. In the initial phase of NDFS algorithm features are divided into clusters using graph-theoretic clustering methods. The final phase of the algorithm generates the subset of relevant features that are closely related to the target class. Features in different clusters are relatively independent. In particular, the minimum spanning tree is constructed to efficiently manipulate the subset of features. Traditionally, feature subset selection research has focused on searching for relevant features. The clustering based strategy of NDFS have a high probability of producing a subset of useful and independent features.

Full Text:

PDF

References


Inza, I, Larranaga, P, Etxeberria, R Sierra, B( 2000), Feature subset selection by Bayesian network-based optimization, Artificial intelligence, vol. 123, no. 1, pp.157-184.

Dy, JG Brodley, CE( 2000), Feature subset selection and order identification for unsupervised learning, proceedings In Proceedings of the Seventeenth International Conference on Machine Learning, pp. 247-254.

Kabir, MM, Islam, MM Murase, K (2010), A new wrapper feature selection approach using neural network, Neurocomputing, vol. 73, no. 16, pp.3273- 3283.

Stein, G, Chen, B, Wu, AS Hua, KA (2005), Decision tree classifier for network intrusion detection with GA-based feature selection, Proceedings of the forty-third ACM Annual Southeast regional conference, Kennesaw, GA, USA, vol. 2, pp. 136-141.

Kira, K Rendell, LA( 1992), A practical approach to feature selection,Proceedings of the ninth international workshop on Machine learning, Aberdeen, Scotland, UK (pp.249-256).

Kononenko, I( 1994), Estimating attributes: analysis and extensions of RELIEF, Proceeding of European Conference on Machine Learning, Catania, Italy, pp. 171-182.

Holte, RC( 1993), Very simple classification rules perform well on most commonly used datasets, Machine learning, vol.11, no.1, pp.63-90.

Peng, H, Long, F Ding C( 2005), Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy, IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 27, no.8, pp.1226-1238.

Lin, D Tang, X( 2006), Conditional infomax learning: an integrated framework for feature extraction and fusion, Proceeding of ninth European Conference on Computer Vision, Graz, pp. 68-82.

Bermejo, P, Gamez, J Puerta, J( 2008), On incremental wrapper-based attribute selection: experimental analysis of the relevance criteria, Proceedings of International Conference on Information Processing and Management of Uncertainty in Knowledge-Based Systems, France, pp.638-645.

Xie, J, Xie, W, Wang, C Gao, X( 2010), A Novel Hybrid Feature Selection Method Based on IFSFFS and SVM for the Diagnosis of Erythemato-Squamous Diseases, Proceedings of Workshop on Applications of Pattern Analysis, Cumberland Lodge, Windsor, UK, pp. 142-151.

Kannan, SS Ramaraj, N (2010), A novel hybrid feature selection via Symmetrical Uncertainty ranking based local memetic search algorithm, Knowledge-Based Systems, vol. 23, no. 6, pp.580-585.

Naseriparsa, M, Bidgoli, AM Varaee, T( 2013), A Hybrid Feature Selection method to improve performance of a group of classification algorithms, International Journal of Computer Applications, vol. 69, no. 17, pp. 0975-8887.

Huda, S, Yearwood, J Stranieri, A(2011), Hybrid wrapper-filter approaches for input feature selection using maximum relevance-minimum redundancy and artificial neural network input gain measurement approximation (ANNIGMA), Proceedings of the Thirty-Fourth Australasian Computer Science Conference, Australia, vol. 113, pp. 43-52.

Gunal, S(2012), Hybrid feature selection for text classification, Turkish Journal of Electrical Engineering and Computer Sciences, vol. 20, no.2, pp.1296-1311.




DOI: https://doi.org/10.23956/ijermt.v6i6.241

Refbacks

  • There are currently no refbacks.