Dimensionality Reduction Algorithms on High Dimensional Datasets
Abstract
Classification problem especially for high dimensional datasets have attracted many researchers in order to find efficient approaches to address them. However, the classification problem has become very complicatedespecially when the number of possible different combinations of variables is so high. In this research, we evaluate the performance of Genetic Algorithm (GA) and Particle Swarm Optimization (PSO) as feature selection algorithms when applied to high dimensional datasets.Our experiments show that in terms of dimensionality reduction, PSO is much better than GA. PSO has successfully reduced the number of attributes of 8 datasets to 13.47% on average while GA is only 31.36% on average. In terms of classification performance, GA is slightly better than PSO. GA†reduced datasets have better performance than their original ones on 5 of 8 datasets while PSO is only 3 of 8 datasets.
Keywords: feature selection, dimensionality reduction, Genetic Algorithm (GA), Particle Swarm Optmization (PSO).
Downloads
References
A. C. Braun, U. Weidner, and S. Hinz, Classification in Highâ€Dimensional Feature Spaces #x2014;Assessment Using SVM, IVM and RVM With Focus on Simulated EnMAP Data, IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens., vol. 5, no. 2, pp. 436-443, Apr. 2012.
J. Fan and Y. Fan, High dimensional classification using features annealed independence rules, Ann Stat., 2008.
A. J. Miller, Subset selection in regression. Boca Raton: Chapman & Hall/CRC, 2002.
I. Fodor, A Survey of Dimension Reduction Techniques, 2002.
F. S. Tsai and K.â€L. Chan, Dimensionality reduction techniques for data exploration, in 2007 6th International Conference on Information, Communications Signal Processing, 2007, pp. 1–5.
Y. Chen, Y. Li, X.â€Q. Cheng, and L. Guo, Survey and taxonomy of feature selection algorithms in intrusion detection system, in Proceedings of the Second SKLOIS conference on Information Security and Cryptology, Berlin, Heidelberg, 2006, pp. 153–167.
S. K. Dandpat and S. Meher, Performance improvement for face recognition using PCA and twoâ€dimensional PCA, in 2013 International Conference on Computer Communication and Informatics (ICCCI), 2013, pp. 1–5.
B. A. Draper, K. Baek, M. S. Bartlett, and J. R. Beveridge, Recognizing faces with PCA and ICA, Comput. Vis. Image Underst., vol. 91, no. 1–2, pp. 115–137, Jul. 2003.
I.â€S. Oh, J.â€S. Lee, and B.â€R. Moon, Hybrid genetic algorithms for feature selection, Pattern Anal. Mach. Intell. IEEE Trans. On, vol. 26, no. 11, pp. 1424-1437, Nov. 2004.
A. S. J. Tjiong and S. T. Monteiro, Feature selection with PSO and kernel methods for hyperspectral classification, in 2011 IEEE Congress on Evolutionary Computation (CEC), 2011, pp. 1762-1769.
Y. Liu, G. Wang, H. Chen, H. Dong, X. Zhu, and S. Wang, An Improved Particle Swarm Optimization for Feature Selection, Engineering, vol. 8, no. 2, pp. 924–928, 2006.
R. Malhotra, N. Singh, and Y. Singh, Genetic Algorithms: Concepts, Design for Optimization of Process Controllers, Comput. Inf. Sci., vol. 4, no. 2, p. p39, 2011.
K. Roy and P. Bhattacharya, Improving Features Subset Selection Using Genetic Algorithms for Iris Recognition, in Artificial Neural Networks in Pattern Recognition, L. Prevost, S. Marinai, and F. Schwenker, Eds. Springer Berlin Heidelberg, 2008, pp. 292–304.
V. Kachitvichyanukul, On Comparison of Three Evolutionary Algorithms: GA, PSO and DE, Ind. Eng. Manag. Syst., vol. 11, no. 3, pp. 215–223, Sep. 2012.
C.â€L. Huang and J.â€F. Dun, A distributed PSO–SVM hybrid system with feature selection and parameter optimization, Appl. Soft Comput., vol. 8, no. 4, pp. 1381–1391, Sep. 2008.
M. A. Schuh, R. A. Angryk, and J. Sheppard, Evolving Kernel Functions with Particle Swarms and Genetic Programming, in Proceedings of the Twentyâ€Fifth International Florida Artificial Intelligence Research Society Conference, 2012, Marco Island, Florida, 2012, pp. 80–85.
M. Korürek and B. Doğan, ECG beat classification using particle swarm optimization and radial basis function neural network, Expert Syst. Appl., vol. 37, no. 12, pp. 7563–7569, Dec. 2010.
A. Moraglio, C. D. Chio, J. Togelius, and R. Poli, Geometric Particle Swarm Optimization. 2008.
J. Davis and M. Goadrich, The relationship between Precisionâ€Recall and ROC curves, in Proceedings of the 23rd international conference on Machine learning, New York, NY, USA, 2006, pp. 233–240.
S. B. Kotsiantis, Supervised Machine Learning: A Review of Classification Techniques, in Proceedings of the 2007 conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies, Amsterdam, The Netherlands, The Netherlands, 2007, pp. 3–24.
N. Williams, S. Z, and G. Armitage, A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification, Comput. Commun. Rev., vol. 30, 2006.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
Plagiarism Check
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.