Feature Selection of Network Intrusion Data using Genetic Algorithm and Particle Swarm Optimization

  • Iwan Syarif Politeknik Elektronika Negeri Surabaya
Keywords: feature selection, Genetic Algorithm (GA), Particle Swarm Optimization (PSO), Evolutionary Algorithm, intrusion detection

Abstract

This paper describes the advantages of using Evolutionary Algorithms (EA) for feature selection on network intrusion dataset. Most current Network Intrusion Detection Systems (NIDS) are unable to detect intrusions in real time because of high dimensional data produced during daily operation. Extracting knowledge from huge data such as intrusion data requires new approach. The more complex the datasets, the higher computation time and the harder they are to be interpreted and analyzed. This paper investigates the performance of feature selection algoritms in network intrusiona data. We used Genetic Algorithms (GA) and Particle Swarm Optimizations (PSO) as feature selection algorithms. When applied to network intrusion datasets, both GA and PSO have significantly reduces the number of features. Our experiments show that GA successfully reduces the number of attributes from 41 to 15 while PSO reduces the number of attributes from 41 to 9. Using k Nearest Neighbour (k-NN) as a classifier,the GA-reduced dataset which consists of 37% of original attributes, has accuracy improvement from 99.28% to 99.70% and its execution time is also 4.8 faster than the execution time of original dataset. Using the same classifier, PSO-reduced dataset which consists of 22% of original attributes, has the fastest execution time (7.2 times faster than the execution time of original datasets). However, its accuracy is slightly reduced 0.02% from 99.28% to 99.26%. Overall, both GA and PSO are good solution as feature selection techniques because theyhave shown very good performance in reducing the number of features significantly while still maintaining and sometimes improving the classification accuracy as well as reducing the computation time.

Downloads

Download data is not yet available.

References

Braun, A.C., U. Weidner, and S. Hinz. “Classification in High-Dimensional Feature Spaces #x2014;Assessment Using SVM, IVM and RVM With Focus on Simulated EnMAP Data.†IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 5, no. 2 (April 2012): 436–43. doi:10.1109/JSTARS.2012.2190266.

Davis, Jesse, and Mark Goadrich. “The Relationship between Precision-Recall and ROC Curves.†In Proceedings of the 23rd International Conference on Machine Learning, 233–240. ICML ’06. New York, NY, USA: ACM, 2006. doi:10.1145/1143844.1143874.

Eskin, E, A Arnold, M Prerau, L Portnoy, and S Stolfo. “A Geometric Framework for Unsupervised Anomaly Detection: Detecting Intrusions in Unlabeled Data.†In Applications of Data Mining in Computer Security. Kluwer, 2002. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.8.5445.

Gudadhe M., Prasad P., Wankhade K., “A new data mining based network intrusion detection modelâ€, in proceeding International Conference on Computer & Communication Technology (ICCCT’10), pp. 731- 735, 2010

Hall, M.A., and G. Holmes. “Benchmarking Attribute Selection Techniques for Discrete Class Data Mining.†IEEE Transactions on Knowledge and Data Engineering 15, no. 6 (2003): 1437–47. doi:10.1109/TKDE.2003.1245283.

Hall, Mark A. “Correlation-Based Feature Selection for Machine Learning,†1999.

Jwo, Dah-Jing, and Shun-Chieh Chang. “Particle Swarm Optimization for GPS Navigation Kalman Filter Adaptation.†Aircraft Engineering and Aerospace Technology 81, no. 4 (July 3, 2009): 343–52. doi:10.1108/00022660910967336.

Kotsiantis, S. B. “Supervised Machine Learning: A Review of Classification Techniques.†In Proceedings of the 2007 Conference on Emerging Artificial Intelligence Applications in Computer Engineering: Real Word AI Systems with Applications in eHealth, HCI, Information Retrieval and Pervasive Technologies, 3–24. Amsterdam, The Netherlands, The Netherlands: IOS Press, 2007. http://dl.acm.org/citation.cfm?id=1566770.1566773.

Lee, Wenke, and Salvatore J. Stolfo. “Data Mining Approaches for Intrusion Detection.†In Proceedings of the 7th Conference on USENIX Security Symposium - Volume 7, 6–6. SSYM’98. Berkeley, CA, USA: USENIX Association, 1998. http://dl.acm.org/citation.cfm?id=1267549.1267555.

Lippmann, Richard, Joshua W. Haines, David J. Fried, Jonathan Korba, and Kumar Das. “The 1999 DARPA off-Line Intrusion Detection Evaluation.†Comput. Netw. 34, no. 4 (October 2000): 579–595. doi:10.1016/S1389-1286(00)00139-0.

Liu, Yuanning, Gang Wang, Huiling Chen, Hao Dong, Xiaodong Zhu, and Sujing Wang. “An Improved Particle Swarm Optimization for Feature Selection.†Engineering 8, no. 2 (2006): 924–28. doi:10.1109/ICCIAS.2006.294274.

Malhotra, Rahul, Narinder Singh, and Yaduvir Singh. “Genetic Algorithms: Concepts, Design for Optimization of Process Controllers.†Computer and Information Science 4, no. 2 (2011): p39. doi:10.5539/cis.v4n2p39.

Portnoy, L, E Eskin, and S Stolfo. “Intrusion Detection with Unlabeled Data Using Clustering,†2001. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.13.7523.

Schuh, Michael A., Rafal A. Angryk, and John Sheppard. “Evolving Kernel Functions with Particle Swarms and Genetic Programming.†In Proceedings of the Twenty-Fifth International Florida Artificial Intelligence Research Society Conference, 2012, edited by G. Michael Youngblood and Philip M. McCarthy, 80–85. Marco Island, Florida: AAAI Press, 2012. http://www.aaai.org/ocs/index.php/FLAIRS/FLAIRS12/paper/view/4479/4770.pdf.

Syarif, Iwan, Adam Prugel-Bennett, and Gary Wills. “Data Mining Approaches for Network Intrusion Detection: From Dimensionality Reduction to Misuse and Anomaly Detection.†Journal of Information Technology Review 3, no. 2 (May 2012): 70–83.

Tjiong, A.S.J., and S.T. Monteiro. “Feature Selection with PSO and Kernel Methods for Hyperspectral Classification.†In 2011 IEEE Congress on Evolutionary Computation (CEC), 1762–69, 2011. doi:10.1109/CEC.2011.5949828.

Williams, Nigel, Sebastian Z, and Grenville Armitage. “A Preliminary Performance Comparison of Five Machine Learning Algorithms for Practical IP Traffic Flow Classification.†Computer Communication Review 30 (2006).

Published
2016-12-15
How to Cite
Syarif, I. (2016). Feature Selection of Network Intrusion Data using Genetic Algorithm and Particle Swarm Optimization. EMITTER International Journal of Engineering Technology, 4(2), 277-290. https://doi.org/10.24003/emitter.v4i2.149
Section
Articles