Classification of Radical Web Content in Indonesia using Web Content Mining and k-Nearest Neighbor Algorithm

Muh Subhan, Amang Sudarsono, Ali Ridho Barakbah

Abstract


Radical content in procedural meaning is content which have provoke the violence, spread the hatred and anti nationalism. Radical definition for each country is different, especially in Indonesia. Radical content is more identical with provocation issue, ethnic and religious hatred that is called SARA in Indonesian languange. SARA content is very difficult to detect due to the large number, unstructure system and many noise can be caused multiple interpretations. This problem can threat the unity and harmony of the religion. According to this condition, it is required a system that can distinguish the radical content or not. In this system, we propose text mining approach using DF threshold and Human Brain as the feature extraction. The system is divided into several steps, those are collecting data which is including at preprocessing part, text mining, selection features, classification for grouping the data with class label, simillarity calculation of data training, and visualization to the radical content or non radical content. The experimental result show that using combination from 10-cross validation and k-Nearest Neighbor (kNN) as the classification methods achieve 66.37% accuracy performance with 7 k value of kNN method[1].


Keywords


K-NN; Nearest Neighbour; Radical Content; Indonesia

Full Text:

PDF

References


. Muh.Subhan, Amang Sudarsono, Ali Ridho Barakbah, Preprocessing of Radicalism Dataset to Predict Radical Content in Indonesia, The International Electronics Symposium on Knowledge Creation and Inteligent Computing (IES-KCIC), September, 2017, Surabaya, Indonesia

. Prichard JJ, MacDonald LE. Cyber terrorism: A study of the extent of coverage in computer Security Textbooks. J Inf Technol Educ. 2004;3:279–89.

. Edna Reid, Jialun Qin, Yilu Zhou, Guanpi Lai, Marc Sageman, Gabriel Weimann, Hsinchun Chen, Collecting and Analyzing the Present of Terrorist on the Web: A Case Study of Jihad Websites, P.Officer et al (Eds): ISI 2005, LNCS 3495, pp. 402-411,2005, Springer-Verlage Berlin Heidelberg 2005.

. Sonali Vighne, Priyanka Trimbake, Anjali Musmade, Ashwini Merukar, Sandip Pandit, An Approach to Detect Terror Related Activities on Net, International Journal Of Advance Research And Innovative Ideas In Edcuation (IJARIIE), Vol-2 Issue-1, 2016.

. Dongjin Choi, Byeongkyu Ko, Heesun Kim, Pankoo Kim, Text analysis for detecting terrorism-related articles on the web, Journal of Network and Computer Applications 38, 16-21 (Science Direct), Elsevier, 2014.

. Gerstenfeld P., Grant, R. Diana, Chiang Pu-Chau, Hate Online : A Content Analysis of Extremes Internet Site”Analysis of social issues and Public Policy, Vol. 3, No.1, 2003, pp.29-44.

. Correa D, Sureka A. Solutions to Detect and Analyze Online Radicalization : A Survey. Arxiv - Comput Soc [Internet]. 2013;V(January):1–30.

. Chaurasia N, Tiwari A. Efficient Algorithm for Destabilization of Terrorist Networks. Int J Inf Technol Comput Sci [Internet]. 2013;5(12):21–30.

. Jayanthi S, Sasikala M. XGraphticsCLUS: Web Mining Hyperlinks and Content of Terrorism websites for Homeland Security. Ijana.in [Internet]. 2011;949:941–8.

. Yuval Elovci,Bracha Shapira, Mark Last, Omer Zaafrani, Menahem Friedman Mothie Schneider, Abraham Kandel(2009), Detection of Access to Related Wb site Using an Advanced Terror Detection System(ATDS), ASIST&T in Wiley International Science.DOI:10.1002/asi.21249

. Mustofa Kamal, Ali Ridho Barakbah, Nur Mubtadai. Temporal Sentiment Analysis for Opinion Mining of ASEAN Free Trade Area on Social Media. The Fifth International Conference on Knowledge Creation and Inteligent Computing (KCIC) 2016-IEEE, November, 2016, Manado, Indonesia.

. K-means MA, Yoyon Y, Mochamad KS, Ketut HI. Preprocessing Data Web Log Untuk Kluster Pengguna Web. 2010;8(1):31–6.

. Winarti, T., & Arief, S, Determining Term on Text Document Clustering using Algorithm of Enhanced Confix Stripping Stemming, 157(9),8-13.

. Muflikhah L, Baharudin B, Document Clustering Using Concept Space and Cosine Similarity Measurement. 2009 Int Conf Comput Technol Dev [Internet]. 2009;58–62.

. Berry MW, Survey of Text Mining : Clustering, Classification, and Retrieval. New York [Internet]. 2004;262.

. Baharudin B, Lee LH, Khan K, A Review of Machine Learning Algorithms for Text-Documents Classification. J Adv Inf Technol. 2010;1(1):4–20.

. Saptono R, Sulistyo ME, Trihabsari NS, Informatika PS, Maret US, Studi P, et al. Text classification Using Naïve Bayes Updateable, 2016;13(02):123–33.

. Forman G, Chapter: Feature Selection for Text Classification Book: Computational Methods of Feature Selection, Chapman and Hall/CRC Press, 2007. 2007;16.

. Wei Zheng, Guohe Feng, Science C, Feature Selection Method Based on Improved Document Frequency. 2014;12(4):905–10.

. Garcia DE. Term Vector Theory and Keyword Weights. 2006;2011(1):1–7.

. Poonkuzhali G, Sarukesi K, Uma G V, Web content outlier mining through mathematical approach and trust rating, ACACOS’11 Proc 10th WSEAS Int Conf Appl Comput Appl Comput Sci [Internet]. 2011;77–82.




DOI: 10.24003/emitter.v5i2.214

Refbacks

  • There are currently no refbacks.


Copyright (c) 2018 EMITTER International Journal of Engineering Technology

EMITTER Journal Editorial Office

 

Politeknik Elektronika Negeri Surabaya

Jl. Raya ITS - Kampus PENS Sukolilo Surabaya 60111, INDONESIA

emitter@pens.ac.id   http://emitter.pens.ac.id   Telp : +62 31 594 7280   Fax : +62 31 594 6114