Cluster-Based News Representative Generation with Automatic Incremental Clustering

  • Irsal Shabirin Politeknik Elektronika Negeri Surabaya, Indonesia
  • Ali Ridho Barakbah Politeknik Elektronika Negeri Surabaya, Indonesia
  • Iwan Syarif Politeknik Elektronika Negeri Surabaya, Indonesia
Keywords: Clustering, Metadata Aggregation, Automatic Incremental Clustering, Representative News

Abstract

Nowadays, a large volume of news circulates around the Internet in one day, amounting to more than two thousand news. However, some of these news have the same topic and content, trapping readers among different sources of news that say similar things. This research proposes a new approach to provide a representative news automatically through the Automatic Incremental Clustering method. This method began with the Data Acquisition process, Keyword Extraction, and Metadata Aggregation to produce a news metadata matrix. The news metadata matrix consisted of types of word in the column and news section of each line. Furthermore, the news on the matrix were grouped by the Automatic Incremental Clustering method based on the number of word similarities that arised, calculated using the Euclidean Distance approach, and was done automatically and real-time. Each cluster (topic) determined one representing news as a Representative News based on the location of the news closest to the midpoint/centroid on the cluster. This study used 101 news as experimental data and produced 87 news clusters with 85.14% precision ratio.

Downloads

Download data is not yet available.

References

J. Efendi and S., Perbandingan Nilai Berita Halaman Depan Portal Berita riauterkini.com dengan Portal Berita goriau.com, Jurnal Online Mahasiswa, vol. 2, Februari 2015.

E. L. Lukman, Laporan: inilah yang dilakukan 74,6 juta pengguna internet Indonesia ketika online, 31 October 2003. [Online]. Available: https://id.techinasia.com/tingkah-laku-pengguna-internet-indonesia. [Accessed on 24 Desember 2015].

R. Nistanto, Pengguna Internet Indonesia Tembus 88 Juta, Kompas, 26 Maret 2015. [Online]. Available: http://tekno.kompas.com/read/2015/03/26/14053597/pengguna.internet.indonesia.tembus.88.juta. [Accessed on 24 Desember 2015].

D. Z. E. Puspitasari, A. R. Barakbah and I. Winarno, Automatic Representative News Generation using Automatic Clustering, Industrial Electronics Seminar (IES), Surabaya, 2012.

M. Sigita, A. R. Barakbah, E. M. Kusumaningtyas and I. Winarno, Automatic Representative News Generation Using On-Line Clustering, EMITTER International Journal of Engineering Technology, vol. 1, p. 107, 2013. DOI: https://doi.org/10.24003/emitter.v1i1.11

D. P. Langgeni, Z. A. Baizal and Y. F. A. Wibowo, Clustering Artikel Berita Berbahasa Indonesia Menggunakan Unsupervised Feature Selection, Seminar Nasional Informatika (SEMNASIF), Yogyakarta, 2010.

J. Azzopardi and C. Staff, Incremental Clustering of News Reports, MDPI Open Access Journals, vol. 5, p. 364, 2012. DOI: https://doi.org/10.3390/a5030364

X. Zhang and Z. Li, Automatic Topic Detection with an Incremental Clustering Algorithm, WISM 2010: Web Information Systems and Mining, Berlin, 2010. DOI: https://doi.org/10.1007/978-3-642-16515-3_43

A. R. Barakbah and K. Arai, Pursuit Reinforcement Competitive Learning: an approach for on-line clustering, The 2nd International Seminar on Information and Communication Technology Seminar (ICTS), Surabaya, 2006.

A. R. Barakbah and K. Arai, Determining constraints of moving variance to find global optimum and make automatic clustering, Industrial Electronics Seminar (IES), Surabaya, 2004.

K. Arai and A. R. Barakbah, Cluster construction method based on global optimum cluster determination with the newly defined moving variance, Japan, 2007.

A. R. Barakbah and K. Arai, Reversed pattern of moving variance for accelerating automatic clustering, EEPIS journal, vol. 2, p. 15, 2004.

A. R. Barakbah and K. Arai, “Identifying moving variance to make automatic clustering for normal data set,†IECI Japan Workshop, Tokyo, 2004.

J. Asian, Effective Techniques for Indonesian Text Retrieval, RMIT Research Repository, Australia, 2007.

A. Z. Arifin, I. P. A. K. Mahendra and H. T. Ciptaningtyas, Enhanced Confix Stripping Stemmer and Ants Algorithm for Classifying News Document in Indonesian Language, Proceeding of International Conference on Information & Communication Technology and Systems (ICTS), Surabaya, 2009.

A. D. Tahitoe and D. Purwitasari, Implementasi Modifikasi Enhanced Confix Stripping Stemmer Untuk Bahasa Indonesia dengan Metode Corpus Based Stemming, Surabaya, 2010.

A. Z. Arifin and A. N. Setiono, Klasifikasi Dokumen Berita Kejadian Berbahasa Indonesia dengan Algoritma Single Pass Clustering, Seminar on Intelligent Technology and Its Applications (SITIA), Surabaya, 2002.

Published
2019-12-01
How to Cite
Shabirin, I., Barakbah, A. R., & Syarif, I. (2019). Cluster-Based News Representative Generation with Automatic Incremental Clustering. EMITTER International Journal of Engineering Technology, 7(2), 467-479. https://doi.org/10.24003/emitter.v7i2.378
Section
Articles