Automatic Representative News Generation using On-Line Clustering
Abstract
The increasing number of online news provider has produced large volume of news every day. The large volume can bring drawback in consuming information efficiently because some news contain similar contents but they have different titles that may appear. This paper presents a new system for automatically generating representative news using on-line clustering. The system allows the clustering to be dynamic with the features of centroid update and new cluster creation. Text mining is implemented to extract the news contents. The representative news is obtained from the closest distance to each centroid that calculated using Euclidean distance. For experimental study, we implement our system to 460 news in Bahasa Indonesia. The experiment performed 70.9% of precision ratio. The error is mainly caused by imprecise results from keyword extraction that generates only one or two keywords for an article. The distribution of centroid’s keywords also affects the clustering results.
Keywords: News Representation, On-line Clustering, Keyword Aggregation, Text Mining.
Downloads
References
Kominfo Pekalongan, Pengguna Internet Indonesia BisaTembus 82 Juta, http://kominfo.pekalongankota.go.id, Retrieved June 19, 2013.
I. Moggi, Daftar Situs Berita Online yang ada di Indonesia, http://www.speechmagazine.blogspot.com, Retrieved May 13, 2011.
Diptia Zandra Eka Puspitasari, Ali Ridho Barakbah, Idris Winarno, Automatic Representative News Generation using Automatic Clustering, Industrial Electronics Seminar (IES) 2011, Surabaya, 2012.
Oren Zamir, Oren Etzioni, Grouper: A Dynamic Clustering Interface to Web Search Result, Department of Computer Science snd Engineering, Seattle, 2010.
A. C. George, Efficient Extraction of News Articles based on RSS. Computer and Informatics Engineering Department, University of Patras.
Ali Ridho Barakbah, Pursuit Reinforcement Competitive Learning: An approach for on-line clustering, The 2nd Information and Communication Technology Seminar (ICTS), Surabaya, 2006.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Plagiarism Check
Plagiarism screening will be conducted by EMITTER Journal Editorial Board using iThenticate Plagiarism Checker and CrossCheck plagiarism screening service. Author should download and signing declaration of plagiarism form here and resubmit it with copyright transfer form via online submission.