Centronit: Initial Centroid Designation Algorithm for K-Means Clustering
Abstract
Clustering performance of the K-means highly depends on the correctness of initial centroids. Usually initial centroids for the K- means clustering are determined randomly so that the determined initial centers may cause to reach the nearest local minima, not the global optimum. In this paper, we propose an algorithm, called as Centronit, for designation of initial centroidoptimization of K-means clustering. The proposed algorithm is based on the calculation of the average distance of the nearest data inside region of the minimum distance. The initial centroids can be designated by the lowest average distance of each data. The minimum distance is set by calculating the average distance between the data. This method is also robust from outliers of data. The experimental results show effectiveness of the proposed method to improve the clustering results with the K-means clustering.
Keywords: K-means clustering, initial centroids, Kmeansoptimization.
Downloads
References
G.A. Growe, Comparing Algorithms and Clustering Data: Components of The Data Mining Process, Thesis, Department of Computer Science and Information Systems, Grand Valley State University, 1999.
V.E. Castro, Why So Many Clustering Algorithms-A Position Paper, ACM SIGKDD Explorations Newsletter, Volume 4, Issue 1, pp. 65-75, 2002.
H. Ralambondrainy, A Conceptual Version of The K-Means Algorithm, Pattern Recognition Letters 16, pp. 1147-1157, 1995
YM. Cheung, k*-Means: A New Generalized K-Means Clustering Algorithm, Pattern Recognition Letters 24, pp. 2883-2893, 2003.
S.S. Khan, A. Ahmad, Cluster Center Initialization Algorithm for K- Means Clustering, Pattern Recognition Letters, 2004.
B. Kövesi, JM. Boucher, S. Saoudi, Stochastic K-Means Algorithm for Vector Quantization, Pattern Recognition Letters 22, pp. 603-610, 2001.
P.S. Bradley, U.M. Fayyad, Refining Initial Points for K-Means Clustering, Proc. 15th International Conference on Machine Learning (ICML’98), 1998.
J.M. Penã, J.A. Lozano, P. Larrañaga, An Empirical Comparison of The Initilization Methods for The K-Means Algorithm, Pattern Recognition Letters 20, pp. 1027-1040, 1999.
C.J. Veenman, M.J.T. Reinders, E. Backer, A Maximum Variance Cluster Algorithm, IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 24, No. 9, pp. 1273-1280, 2002.
S. Ray, R.H. Turi, Determination of Number of Clusters in K-Means Clustering and Application in Colthe Image Segmentation, Proc. 4th ICAPRDT, pp.137-143, 1999.
W.H. Ming, C.J. Hou, Cluster Analysis and Visualization, Workshop on Statistics and Machine Learning, Institute of Statistical Science, Academia Sinica, 2004.
Ali Ridho Barakbah, Kohei Arai, Identifying Moving Variance to Make Automatic Clustering for Normal Dataset, Proc. IECI Japan Workshop 2004 (IJW 2004), Musashi Institute of Technology, Tokyo, 2004.
UCIRepository (http://www.sgi.com/tech/mlc/db/).
C. Yi-tsuu, Interactive Pattern Recognition, Marcel Dekker Inc., New York and Basel, 1978.
R.K. Pearson, T. Zylkin, J.S. Schwaber, G.E. Gonye, Quantitative Evaluation of Clustering Results Using Computational Negative Controls, Proc. 2004 SIAM International Conference on Data Mining, Lake Buena Vista, Florida, 2004.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
Plagiarism Check
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.