Data Mining Approach for Breast Cancer Patient Recovery
AbstractBreast cancer is the second highest cancer type which attacked Indonesian women. There are several factors known related to encourage an increased risk of breast cancer, but especially in Indonesia that factors often depends on the treatment routinely. This research examines the determinant factors of breast cancer and measures the breast cancer patient data to build the useful classification model using data mining approach.The dataset was originally taken from one of Oncology Hospital in East Java, Indonesia, which consists of 1097 samples, 21 attributes and 2 classes. We used three different feature selection algorithms which are Information Gain, Fisherâ€™s Discriminant Ratio and Chi-square to select the best attributes that have great contribution to the data. We applied Hierarchical K-means Clustering to remove attributes which have lowest contribution. Our experiment showed that only 14 of 21 original attributes have the highest contribution factor of the breast cancer data. The clustering algorithmdecreased the error ratio from 44.48% (using 21 original attributes) to 18.32% (using 14 most important attributes).We also applied the classification algorithm to build the classification model and measure the precision of breast cancer patient data. The comparison of classification algorithms between NaÃ¯ve Bayes and Decision Tree were both given precision reach 92.76% and 92.99% respectively by leave-one-out cross validation. The information based on our data research, the breast cancer patient in Indonesia especially in East Java must be improved by the treatment routinely in the hospital to get early recover of breast cancer which it is related with adherence of patient.
Abdelghani B., Erhan G., Predicting Breast Cancer Survivability using Data Mining Techniques, Ninth Workshop on Mining Scientific and Engineering Datasets in conjunction with the Sixth SIAM International Conference on Data Mining, 2006.
Albar ZA, Tjindarbumi D, Ramli M, Lukitto P, Reksoprawiro S,Handojo D,Darwis I, Suardi DR, Achmad D, Protokol Peraboi 2003, Perhimpunan Ahli Bedah Onkologi Indonesia, 2004.
American Cancer Society 2011-2012, Breast Cancer Survival Rates by Stage:Breast Cancer Guidelines, American Cancer Society Breast Cancer Facts&Figures, 2012.
Andri Permana Wicaksono, Tessy Badriyah, Achmad Basuki, Comparison of The Data-Mining Methods in Predicting The Risk Level of Diabetes, EMITTER International Journal of Engineering Technology, Vol. 4, No.1, pp. 164-178, 2016.
Bustami, Penerapan Algoritma NaÃ¯ve Bayes untuk Mengklasifikasi Data Nasabah Asuransi, TECHSI: JurnalPenelitian Teknik Informatika, Vol.8, No.1, pp.128-146, 2014.
Charu C. Aggarwal, Data Classification: Algorithms and Applications, CRC Press, pp. 1-667, 2014.
Cheng T.Y.,, Cheng M. C., Bor W. C., Prediction of Survival in Patients with Breast Cancer using Three Artificial Intelligence Techniques, Journal of Theoretical and Apllied Information technology, Vol.60, No.1, pp. 179-183, 2014.
Cornain S, Mangunkusumo R, Nasar IM, Pribartono J, Ten Most Frequent Cancers in lndonesia :Pathology based Cancer Registry Data of l988-1989, In:Cancer Registry in Indonesia, National Cancer Registry Center, JakartaCoordinating Board, 1990.
Eko Prasetyo, Data Mining-Mengolah Data Menjadi Informasi Menggunakan Matlab, Andi Offset, Ed.1, pp. 28-30, 2014.
Hadi L. A., Maryam A., Masoud R., Farahnaz S., Prediction of Breast Cancer Survival Through Knowledge Discovery in Databases, Global Journal of Health Science, Vol.7, No.4, pp.392-398, 2015.
International Agency for Research on Cancer, Latest World Cancer Statistics Global Cancer Burden Rises to 14.1 Million New Cases in 2012: Marked Increase in Breast Cancers Must be Addressed, IARC Press ReleaseNÂ° 223 , 2013.
Irawan C, Hukom R, Prayogo N, Factors Associated with Bone Metastasis inBreast Cancer: A Preliminary Study in An Indonesian Population, Acta MedIndones-Indones J Intern Med, Vol.40, No.4, pp.178-180, 2008.
IwanSyarif, Feature Selection of Network Intrusion Data using Genetic Algorithm and Particle Swarm Optimization, EMITTER International Journal of Engineering Technology, Vol. 4, No.2, pp. 277-290, 2016.
Jaree T.,Guandong X., Yanchun Z., Fuchun H., Breast Cancer Survivability via Ada Boost Algorithms, In: Health data and knowledge management: proceedings of the Second Australasian Workshop on Health Data and Knowledge Management (HDKM), Wollongong, NSW, Australia, Vol. 80, pp.55-64, 2008.
Jemal A, Clegg LX, Ward E, Ries LA, Wu X, Jamison PM, Wingo PA, HoweHL, Anderson RN, Edwards BK, Annual Report to The Nation on The Status of Cancer, 1975-2001, with A Special Feature Regarding Survival, Cancer, Vol.101, No.1, pp.3-27, 2004.
Kohei Arai, Ali Ridho Barakbah, Hierarchical K-means: an Algorithm forCentroids Initialization for K-means, Reports of the Faculty of Science andEngineering, Saga University, Japan, Vol.36, No.1, 2007.
Luis Carlos Molina, LluÃsBelanche, Ã€ngelaNebot, Feature Selection Algorithms: A Survey and Experimental Evaluation, IEEE International Conference on Data Mining, Maebashi City, Japan, pp. 1 â€“ 19, 2002.
McCready D, Holloway C, Shelley W, Down N, Robinson P, Sinclair S, Mirsky D, Surgical Management of Early Stage Invasive Breast Cancer: A Practice Guideline, Can J Surg, Vol.48, No.3, pp.185-194, 2005.
Ministry of Health Republic of Indonesia, Indonesia Health profile 1993, Jakarta:Departemen Kesehatan RI,1993.
National Breast and Ovarian Cancer Centre, Breast Cancer Risk Factors: A Review of The Evidence, National Breast and Ovarian Cancer Centre, Surry Hills, NSW,Resources for Health Progessionals, 2009.
Ng CH, Pathy NB, Taib NA, Teh YC, Mun KS, Amiruddin A, Evlina S,Rhodes A, Yip CH, Comparison of Breast Cancer in Indonesia and Malaysiaâ€”a Clinico-Pathological Study between Dharmais Cancer Centre Jakarta and University Malaya Medical Centre, Kuala Lumpur, Asian Pac J Cancer Prev, Vol.12, No.11, pp.2943-2946, 2011.
Prihartono J, Mangunkusumo R, Partoatmodjo P, Establishing Pathology basedCancer Registry: Indonesian Experience. In: Sasaki R, Aoki K, editors.Epidemiology and Prevention of Cancer. Proceedings of Monbusho (Ministryof Education, Science & Culture) International Symposium on Coparative Study of Etiology & Prevention of Cancer, Nagoya, 1989. Nagoya: The University of Nagoya Press, pp. 211-16, 1990.
R.K. Kavitha, Dorai R, Predicting Breast Cancer Survivability using NaÃ¯ve Bayes Classifier and C4.5 Algorithm, Elysium Journal, Vol.1, No.1, pp.61-63, 2014.
Sergio Verdu, Fellow, IEEE, Fifty Years of Shannon Theory, IEEE Transactions on Information Theory, Vol.44, No.6, pp.2057-2078, 1998.
Tjindarbumi, Diagnosis dan Pencegahan Kanker Payudara, Kursus Singkat Deteksi Dini dan Pencegahan Kanker, FKUI-POI, Jakarta, 6-8 November, 1995.
Tresna, MaulanaFahrudin, IwanSyarif, Ali RidhoBarakbah, Ant Colony Algorithm for Feature Selection on Microarray Datasets,The Eighteenth International Electronics Symposium (IES)-IEEE co-sponsored conference, Bali, Indonesia, 2016.
Tresna Maulana Fahrudin, Iwan Syarif, Ali Ridho Barakbah, The Determinant Factor of Breast Cancer on Medical Oncology using Feature Selection Based Clustering, The Fifth International Conference on Knowledge Creation and Intelligent Computing (KCIC) 2016-IEEE co-sponsored conference, Manado, Indonesia, 2016.
Wakai K, Dillon DS, Ohno Y, Prihartono J, Budiningsih S, Ramli M, Darwis I,Tjindarbumi D, Tjahjadi G, Soetrisno E, Roostini ES, Sakamoto G, Herman S,Cornain S, Fat Intake and Breast Cancer Risk in An Area Where Fat Intake isLow: A Case-Control Study in Indonesia, Int J Epidemiol, Vol.29, No.1, pp.20-28, 2000.
Zulaiha Ali Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan, Khairuddin Omar, Nor LiyanaMohdShuib, Agent Based Preprocessing, International Conference on Intelligent and Advanced Systems, KL Convention Centre, pp. 219 â€“ 223, 2007.
Copyright (c) 2017 EMITTER International Journal of Engineering Technology
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
The copyright to this article is transferred to Politeknik Elektronika Negeri Surabaya(PENS) if and when the article is accepted for publication. The undersigned hereby transfers any and all rights in and to the paper including without limitation all copyrights to PENS. The undersigned hereby represents and warrants that the paper is original and that he/she is the author of the paper, except for material that is clearly identified as to its original source, with permission notices from the copyright owners where required. The undersigned represents that he/she has the power and authority to make and execute this assignment. The copyright transfer form can be downloaded here .
The corresponding author signs for and accepts responsibility for releasing this material on behalf of any and all co-authors. This agreement is to be signed by at least one of the authors who have obtained the assent of the co-author(s) where applicable. After submission of this agreement signed by the corresponding author, changes of authorship or in the order of the authors listed will not be accepted.
Retained Rights/Terms and Conditions
- Authors retain all proprietary rights in any process, procedure, or article of manufacture described in the Work.
- Authors may reproduce or authorize others to reproduce the work or derivative works for the author’s personal use or company use, provided that the source and the copyright notice of Politeknik Elektronika Negeri Surabaya (PENS) publisher are indicated.
- Authors are allowed to use and reuse their articles under the same CC-BY-NC-SA license as third parties.
- Third-parties are allowed to share and adapt the publication work for all non-commercial purposes and if they remix, transform, or build upon the material, they must distribute under the same license as the original.
To avoid plagiarism activities, the manuscript will be checked twice by the Editorial Board of the EMITTER International Journal of Engineering Technology (EMITTER Journal) using iThenticate Plagiarism Checker and the CrossCheck plagiarism screening service. The similarity score of a manuscript has should be less than 25%. The manuscript that plagiarizes another author’s work or author's own will be rejected by EMITTER Journal.
Authors are expected to comply with EMITTER Journal's plagiarism rules by downloading and signing the plagiarism declaration form here and resubmitting the form, along with the copyright transfer form via online submission.