Dimensionality Reduction Algorithms on High Dimensional Datasets

Iwan Syarif

Abstract


Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicatedespecially when the number of possible different combinations of variables is so high. In this research, we evaluate the performance of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) as feature selection algorithms when applied to high dimensional datasets.Our experiments show that in terms of dimensionality reduction, PSO is much better than GA. PSO has successfully reduced the number of attributes of 8 datasets to 13.47% on average while GA is only 31.36% on average. In terms of classification performance, GA is slightly better than PSO. GA‐ reduced datasets have better performance than their original ones on 5 of 8 datasets while PSO is only 3 of 8 datasets.

Keywords: feature selection, dimensionality reduction, Genetic Algorithm (GA), Particle Swarm Optmization (PSO).


Full Text:

PDF

References


A. C. Braun, U. Weidner, and S. Hinz, Classification in High‐Dimensional Feature Spaces #x2014;Assessment Using SVM, IVM and RVM With Focus on Simulated EnMAP Data, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 5, no. 2, pp. 436-443, Apr. 2012.

J. Fan and Y. Fan, High dimensional classification using features annealed independence rules, Ann Stat., 2008.

A. J. Miller, Subset selection in regression. Boca Raton: Chapman & Hall/CRC, 2002.

I. Fodor, A Survey of Dimension Reduction Techniques, 2002.

F. S. Tsai and K.‐L. Chan, Dimensionality reduction techniques for data exploration, in 2007 6th International Conference on Information, Communications Signal Processing, 2007, pp. 1–5.

Y. Chen, Y. Li, X.‐Q. Cheng, and L. Guo, Survey and taxonomy of feature selection algorithms in intrusion detection system, in Proceedings of the Second SKLOIS conference on Information Security and Cryptology, Berlin, Heidelberg, 2006, pp. 153–167.

S. K. Dandpat and S. Meher, Performance improvement for face recognition using PCA and two‐dimensional PCA, in 2013 International Conference on Computer Communication and Informatics (ICCCI), 2013, pp. 1–5.

B. A. Draper, K. Baek, M. S. Bartlett, and J. R. Beveridge, Recognizing faces with PCA and ICA, Comput. Vis. Image Underst., vol. 91, no. 1–2, pp. 115–137, Jul. 2003.

I.‐S. Oh, J.‐S. Lee, and B.‐R. Moon, Hybrid genetic algorithms for feature selection, Pattern Anal. Mach. Intell. IEEE Trans. On, vol. 26, no. 11, pp. 1424-1437, Nov. 2004.

A. S. J. Tjiong and S. T. Monteiro, Feature selection with PSO and kernel methods for hyperspectral classification, in 2011 IEEE Congress on Evolutionary Computation (CEC), 2011, pp. 1762-1769.

Y. Liu, G. Wang, H. Chen, H. Dong, X. Zhu, and S. Wang, An Improved Particle Swarm Optimization for Feature Selection, Engineering, vol. 8, no. 2, pp. 924–928, 2006.

R. Malhotra, N. Singh, and Y. Singh, Genetic Algorithms: Concepts, Design for Optimization of Process Controllers, Comput. Inf. Sci., vol. 4, no. 2, p. p39, 2011.

K. Roy and P. Bhattacharya, Improving Features Subset Selection Using Genetic Algorithms for Iris Recognition, in Artificial Neural Networks in Pattern Recognition, L. Prevost, S. Marinai, and F. Schwenker, Eds. Springer Berlin Heidelberg, 2008, pp. 292–304.

V. Kachitvichyanukul, On Comparison of Three Evolutionary Algorithms: GA, PSO and DE, Ind. Eng. Manag. Syst., vol. 11, no. 3, pp. 215–223, Sep. 2012.

C.‐L. Huang and J.‐F. Dun, A distributed PSO–SVM hybrid system with feature selection and parameter optimization, Appl. Soft Comput., vol. 8, no. 4, pp. 1381–1391, Sep. 2008.

M. A. Schuh, R. A. Angryk, and J. Sheppard, Evolving Kernel Functions with Particle Swarms and Genetic Programming, in Proceedings of the Twenty‐Fifth International Florida Artificial Intelligence Research Society Conference, 2012, Marco Island, Florida, 2012, pp. 80–85.

M. Korürek and B. Doğan, ECG beat classification using particle swarm optimization and radial basis function neural network, Expert Syst. Appl., vol. 37, no. 12, pp. 7563–7569, Dec. 2010.

A. Moraglio, C. D. Chio, J. Togelius, and R. Poli, Geometric Particle Swarm Optimization. 2008.

J. Davis and M. Goadrich, The relationship between Precision‐Recall and ROC curves, in Proceedings of the 23rd international conference on Machine learning, New York, NY, USA, 2006, pp. 233–240.

S. B. Kotsiantis, Supervised Machine Learning: A Review of Classification Techniques, in Proceedings of the 2007 conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies, Amsterdam, The Netherlands, The Netherlands, 2007, pp. 3–24.

N. Williams, S. Z, and G. Armitage, A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification, Comput. Commun. Rev., vol. 30, 2006.




DOI: 10.24003/emitter.v2i2.24

Refbacks

  • There are currently no refbacks.


Copyright (c) 2016 EMITTER International Journal of Engineering Technology

EMITTER Journal Editorial Office

 

Politeknik Elektronika Negeri Surabaya

Jl. Raya ITS - Kampus PENS Sukolilo Surabaya 60111, INDONESIA

emitter@pens.ac.id   http://emitter.pens.ac.id   Telp : +62 31 594 7280   Fax : +62 31 594 6114