Dimensionality Reduction Algorithms on High Dimensional Datasets

  • Iwan Syarif Politeknik Elektronika Negeri Surabaya

Abstract

Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicatedespecially when the number of possible different combinations of variables is so high. In this research, we evaluate the performance of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) as feature selection algorithms when applied to high dimensional datasets.Our experiments show that in terms of dimensionality reduction, PSO is much better than GA. PSO has successfully reduced the number of attributes of 8 datasets to 13.47% on average while GA is only 31.36% on average. In terms of classification performance, GA is slightly better than PSO. GA†reduced datasets have better performance than their original ones on 5 of 8 datasets while PSO is only 3 of 8 datasets.

Keywords: feature selection, dimensionality reduction, Genetic Algorithm (GA), Particle Swarm Optmization (PSO).

Downloads

Download data is not yet available.

References

A. C. Braun, U. Weidner, and S. Hinz, Classification in Highâ€Dimensional Feature Spaces #x2014;Assessment Using SVM, IVM and RVM With Focus on Simulated EnMAP Data, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 5, no. 2, pp. 436-443, Apr. 2012.

J. Fan and Y. Fan, High dimensional classification using features annealed independence rules, Ann Stat., 2008.

A. J. Miller, Subset selection in regression. Boca Raton: Chapman & Hall/CRC, 2002.

I. Fodor, A Survey of Dimension Reduction Techniques, 2002.

F. S. Tsai and K.â€L. Chan, Dimensionality reduction techniques for data exploration, in 2007 6th International Conference on Information, Communications Signal Processing, 2007, pp. 1–5.

Y. Chen, Y. Li, X.â€Q. Cheng, and L. Guo, Survey and taxonomy of feature selection algorithms in intrusion detection system, in Proceedings of the Second SKLOIS conference on Information Security and Cryptology, Berlin, Heidelberg, 2006, pp. 153–167.

S. K. Dandpat and S. Meher, Performance improvement for face recognition using PCA and twoâ€dimensional PCA, in 2013 International Conference on Computer Communication and Informatics (ICCCI), 2013, pp. 1–5.

B. A. Draper, K. Baek, M. S. Bartlett, and J. R. Beveridge, Recognizing faces with PCA and ICA, Comput. Vis. Image Underst., vol. 91, no. 1–2, pp. 115–137, Jul. 2003.

I.â€S. Oh, J.â€S. Lee, and B.â€R. Moon, Hybrid genetic algorithms for feature selection, Pattern Anal. Mach. Intell. IEEE Trans. On, vol. 26, no. 11, pp. 1424-1437, Nov. 2004.

A. S. J. Tjiong and S. T. Monteiro, Feature selection with PSO and kernel methods for hyperspectral classification, in 2011 IEEE Congress on Evolutionary Computation (CEC), 2011, pp. 1762-1769.

Y. Liu, G. Wang, H. Chen, H. Dong, X. Zhu, and S. Wang, An Improved Particle Swarm Optimization for Feature Selection, Engineering, vol. 8, no. 2, pp. 924–928, 2006.

R. Malhotra, N. Singh, and Y. Singh, Genetic Algorithms: Concepts, Design for Optimization of Process Controllers, Comput. Inf. Sci., vol. 4, no. 2, p. p39, 2011.

K. Roy and P. Bhattacharya, Improving Features Subset Selection Using Genetic Algorithms for Iris Recognition, in Artificial Neural Networks in Pattern Recognition, L. Prevost, S. Marinai, and F. Schwenker, Eds. Springer Berlin Heidelberg, 2008, pp. 292–304.

V. Kachitvichyanukul, On Comparison of Three Evolutionary Algorithms: GA, PSO and DE, Ind. Eng. Manag. Syst., vol. 11, no. 3, pp. 215–223, Sep. 2012.

C.â€L. Huang and J.â€F. Dun, A distributed PSO–SVM hybrid system with feature selection and parameter optimization, Appl. Soft Comput., vol. 8, no. 4, pp. 1381–1391, Sep. 2008.

M. A. Schuh, R. A. Angryk, and J. Sheppard, Evolving Kernel Functions with Particle Swarms and Genetic Programming, in Proceedings of the Twentyâ€Fifth International Florida Artificial Intelligence Research Society Conference, 2012, Marco Island, Florida, 2012, pp. 80–85.

M. Korürek and B. Doğan, ECG beat classification using particle swarm optimization and radial basis function neural network, Expert Syst. Appl., vol. 37, no. 12, pp. 7563–7569, Dec. 2010.

A. Moraglio, C. D. Chio, J. Togelius, and R. Poli, Geometric Particle Swarm Optimization. 2008.

J. Davis and M. Goadrich, The relationship between Precisionâ€Recall and ROC curves, in Proceedings of the 23rd international conference on Machine learning, New York, NY, USA, 2006, pp. 233–240.

S. B. Kotsiantis, Supervised Machine Learning: A Review of Classification Techniques, in Proceedings of the 2007 conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies, Amsterdam, The Netherlands, The Netherlands, 2007, pp. 3–24.

N. Williams, S. Z, and G. Armitage, A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification, Comput. Commun. Rev., vol. 30, 2006.

Published
2014-12-01
How to Cite
Syarif, I. (2014). Dimensionality Reduction Algorithms on High Dimensional Datasets. EMITTER International Journal of Engineering Technology, 2(2), 28-38. https://doi.org/10.24003/emitter.v2i2.24
Section
Articles