Feature Selection of Network Intrusion Data using Genetic Algorithm and Particle Swarm Optimization
AbstractThis paper describes the advantages of using Evolutionary Algorithms (EA) for feature selection on network intrusion dataset. Most current Network Intrusion Detection Systems (NIDS) are unable to detect intrusions in real time because of high dimensional data produced during daily operation. Extracting knowledge from huge data such as intrusion data requires new approach. The more complex the datasets, the higher computation time and the harder they are to be interpreted and analyzed. This paper investigates the performance of feature selection algoritms in network intrusiona data. We used Genetic Algorithms (GA) and Particle Swarm Optimizations (PSO) as feature selection algorithms. When applied to network intrusion datasets, both GA and PSO have significantly reduces the number of features. Our experiments show that GA successfully reduces the number of attributes from 41 to 15 while PSO reduces the number of attributes from 41 to 9. Using k Nearest Neighbour (k-NN) as a classifier,the GA-reduced dataset which consists of 37% of original attributes, has accuracy improvement from 99.28% to 99.70% and its execution time is also 4.8 faster than the execution time of original dataset. Using the same classifier, PSO-reduced dataset which consists of 22% of original attributes, has the fastest execution time (7.2 times faster than the execution time of original datasets). However, its accuracy is slightly reduced 0.02% from 99.28% to 99.26%. Overall, both GA and PSO are good solution as feature selection techniques because theyhave shown very good performance in reducing the number of features significantly while still maintaining and sometimes improving the classification accuracy as well as reducing the computation time.
Braun, A.C., U. Weidner, and S. Hinz. â€œClassification in High-Dimensional Feature Spaces #x2014;Assessment Using SVM, IVM and RVM With Focus on Simulated EnMAP Data.â€ IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 5, no. 2 (April 2012): 436â€“43. doi:10.1109/JSTARS.2012.2190266.
Davis, Jesse, and Mark Goadrich. â€œThe Relationship between Precision-Recall and ROC Curves.â€ In Proceedings of the 23rd International Conference on Machine Learning, 233â€“240. ICML â€™06. New York, NY, USA: ACM, 2006. doi:10.1145/1143844.1143874.
Eskin, E, A Arnold, M Prerau, L Portnoy, and S Stolfo. â€œA Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data.â€ In Applications of Data Mining in Computer Security. Kluwer, 2002. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.5445.
Gudadhe M., Prasad P., Wankhade K., â€œA new data mining based network intrusion detection modelâ€, in proceeding International Conference on Computer & Communication Technology (ICCCTâ€™10), pp. 731- 735, 2010
Hall, M.A., and G. Holmes. â€œBenchmarking Attribute Selection Techniques for Discrete Class Data Mining.â€ IEEE Transactions on Knowledge and Data Engineering 15, no. 6 (2003): 1437â€“47. doi:10.1109/TKDE.2003.1245283.
Hall, Mark A. â€œCorrelation-Based Feature Selection for Machine Learning,â€ 1999.
Jwo, Dah-Jing, and Shun-Chieh Chang. â€œParticle Swarm Optimization for GPS Navigation Kalman Filter Adaptation.â€ Aircraft Engineering and Aerospace Technology 81, no. 4 (July 3, 2009): 343â€“52. doi:10.1108/00022660910967336.
Kotsiantis, S. B. â€œSupervised Machine Learning: A Review of Classification Techniques.â€ In Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies, 3â€“24. Amsterdam, The Netherlands, The Netherlands: IOS Press, 2007. http://dl.acm.org/citation.cfm?id=1566770.1566773.
Lee, Wenke, and Salvatore J. Stolfo. â€œData Mining Approaches for Intrusion Detection.â€ In Proceedings of the 7th Conference on USENIX Security Symposium - Volume 7, 6â€“6. SSYMâ€™98. Berkeley, CA, USA: USENIX Association, 1998. http://dl.acm.org/citation.cfm?id=1267549.1267555.
Lippmann, Richard, Joshua W. Haines, David J. Fried, Jonathan Korba, and Kumar Das. â€œThe 1999 DARPA off-Line Intrusion Detection Evaluation.â€ Comput. Netw. 34, no. 4 (October 2000): 579â€“595. doi:10.1016/S1389-1286(00)00139-0.
Liu, Yuanning, Gang Wang, Huiling Chen, Hao Dong, Xiaodong Zhu, and Sujing Wang. â€œAn Improved Particle Swarm Optimization for Feature Selection.â€ Engineering 8, no. 2 (2006): 924â€“28. doi:10.1109/ICCIAS.2006.294274.
Malhotra, Rahul, Narinder Singh, and Yaduvir Singh. â€œGenetic Algorithms: Concepts, Design for Optimization of Process Controllers.â€ Computer and Information Science 4, no. 2 (2011): p39. doi:10.5539/cis.v4n2p39.
Portnoy, L, E Eskin, and S Stolfo. â€œIntrusion Detection with Unlabeled Data Using Clustering,â€ 2001. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.7523.
Schuh, Michael A., Rafal A. Angryk, and John Sheppard. â€œEvolving Kernel Functions with Particle Swarms and Genetic Programming.â€ In Proceedings of the Twenty-Fifth International Florida Artificial Intelligence Research Society Conference, 2012, edited by G. Michael Youngblood and Philip M. McCarthy, 80â€“85. Marco Island, Florida: AAAI Press, 2012. http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS12/paper/view/4479/4770.pdf.
Syarif, Iwan, Adam Prugel-Bennett, and Gary Wills. â€œData Mining Approaches for Network Intrusion Detection: From Dimensionality Reduction to Misuse and Anomaly Detection.â€ Journal of Information Technology Review 3, no. 2 (May 2012): 70â€“83.
Tjiong, A.S.J., and S.T. Monteiro. â€œFeature Selection with PSO and Kernel Methods for Hyperspectral Classification.â€ In 2011 IEEE Congress on Evolutionary Computation (CEC), 1762â€“69, 2011. doi:10.1109/CEC.2011.5949828.
Williams, Nigel, Sebastian Z, and Grenville Armitage. â€œA Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification.â€ Computer Communication Review 30 (2006).
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.