Classification of Radical Web Content in Indonesia using Web Content Mining and k-Nearest Neighbor Algorithm
Abstract
Radical content in procedural meaning is content which have provoke the violence, spread the hatred and anti nationalism. Radical definition for each country is different, especially in Indonesia. Radical content is more identical with provocation issue, ethnic and religious hatred that is called SARA in Indonesian languange. SARA content is very difficult to detect due to the large number, unstructure system and many noise can be caused multiple interpretations. This problem can threat the unity and harmony of the religion. According to this condition, it is required a system that can distinguish the radical content or not. In this system, we propose text mining approach using DF threshold and Human Brain as the feature extraction. The system is divided into several steps, those are collecting data which is including at preprocessing part, text mining, selection features, classification for grouping the data with class label, simillarity calculation of data training, and visualization to the radical content or non radical content. The experimental result show that using combination from 10-cross validation and k-Nearest Neighbor (kNN) as the classification methods achieve 66.37% accuracy performance with 7 k value of kNN method[1].
Downloads
References
. Muh.Subhan, Amang Sudarsono, Ali Ridho Barakbah, Preprocessing of Radicalism Dataset to Predict Radical Content in Indonesia, The International Electronics Symposium on Knowledge Creation and Inteligent Computing (IES-KCIC), September, 2017, Surabaya, Indonesia
. Prichard JJ, MacDonald LE. Cyber terrorism: A study of the extent of coverage in computer Security Textbooks. J Inf Technol Educ. 2004;3:279–89.
. Edna Reid, Jialun Qin, Yilu Zhou, Guanpi Lai, Marc Sageman, Gabriel Weimann, Hsinchun Chen, Collecting and Analyzing the Present of Terrorist on the Web: A Case Study of Jihad Websites, P.Officer et al (Eds): ISI 2005, LNCS 3495, pp. 402-411,2005, Springer-Verlage Berlin Heidelberg 2005.
. Sonali Vighne, Priyanka Trimbake, Anjali Musmade, Ashwini Merukar, Sandip Pandit, An Approach to Detect Terror Related Activities on Net, International Journal Of Advance Research And Innovative Ideas In Edcuation (IJARIIE), Vol-2 Issue-1, 2016.
. Dongjin Choi, Byeongkyu Ko, Heesun Kim, Pankoo Kim, Text analysis for detecting terrorism-related articles on the web, Journal of Network and Computer Applications 38, 16-21 (Science Direct), Elsevier, 2014.
. Gerstenfeld P., Grant, R. Diana, Chiang Pu-Chau, Hate Online : A Content Analysis of Extremes Internet Siteâ€Analysis of social issues and Public Policy, Vol. 3, No.1, 2003, pp.29-44.
. Correa D, Sureka A. Solutions to Detect and Analyze Online Radicalization : A Survey. Arxiv - Comput Soc [Internet]. 2013;V(January):1–30.
. Chaurasia N, Tiwari A. Efficient Algorithm for Destabilization of Terrorist Networks. Int J Inf Technol Comput Sci [Internet]. 2013;5(12):21–30.
. Jayanthi S, Sasikala M. XGraphticsCLUS: Web Mining Hyperlinks and Content of Terrorism websites for Homeland Security. Ijana.in [Internet]. 2011;949:941–8.
. Yuval Elovci,Bracha Shapira, Mark Last, Omer Zaafrani, Menahem Friedman Mothie Schneider, Abraham Kandel(2009), Detection of Access to Related Wb site Using an Advanced Terror Detection System(ATDS), ASIST&T in Wiley International Science.DOI:10.1002/asi.21249
. Mustofa Kamal, Ali Ridho Barakbah, Nur Mubtadai. Temporal Sentiment Analysis for Opinion Mining of ASEAN Free Trade Area on Social Media. The Fifth International Conference on Knowledge Creation and Inteligent Computing (KCIC) 2016-IEEE, November, 2016, Manado, Indonesia.
. K-means MA, Yoyon Y, Mochamad KS, Ketut HI. Preprocessing Data Web Log Untuk Kluster Pengguna Web. 2010;8(1):31–6.
. Winarti, T., & Arief, S, Determining Term on Text Document Clustering using Algorithm of Enhanced Confix Stripping Stemming, 157(9),8-13.
. Muflikhah L, Baharudin B, Document Clustering Using Concept Space and Cosine Similarity Measurement. 2009 Int Conf Comput Technol Dev [Internet]. 2009;58–62.
. Berry MW, Survey of Text Mining : Clustering, Classification, and Retrieval. New York [Internet]. 2004;262.
. Baharudin B, Lee LH, Khan K, A Review of Machine Learning Algorithms for Text-Documents Classification. J Adv Inf Technol. 2010;1(1):4–20.
. Saptono R, Sulistyo ME, Trihabsari NS, Informatika PS, Maret US, Studi P, et al. Text classification Using Naïve Bayes Updateable, 2016;13(02):123–33.
. Forman G, Chapter: Feature Selection for Text Classification Book: Computational Methods of Feature Selection, Chapman and Hall/CRC Press, 2007. 2007;16.
. Wei Zheng, Guohe Feng, Science C, Feature Selection Method Based on Improved Document Frequency. 2014;12(4):905–10.
. Garcia DE. Term Vector Theory and Keyword Weights. 2006;2011(1):1–7.
. Poonkuzhali G, Sarukesi K, Uma G V, Web content outlier mining through mathematical approach and trust rating, ACACOS’11 Proc 10th WSEAS Int Conf Appl Comput Appl Comput Sci [Internet]. 2011;77–82.
Copyright (c) 2018 EMITTER International Journal of Engineering Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
Plagiarism Check
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.