Data Mining Approach for Breast Cancer Patient Recovery

  • Tresna Maulana Fahrudin Politeknik Elektronika Negeri Surabaya
  • Iwan Syarif Politeknik Elektronika Negeri Surabaya
  • Ali Ridho Barakbah Politeknik Elektronika Negeri Surabaya
Keywords: Data Mining, Breast Cancer, Feature Selection, Clustering, Classification.


Breast cancer is the second highest cancer type which attacked Indonesian women. There are several factors known related to encourage an increased risk of breast cancer, but especially in Indonesia that factors often depends on the treatment routinely. This research examines the determinant factors of breast cancer and measures the breast cancer patient data to build the useful classification model using data mining approach.The dataset was originally taken from one of Oncology Hospital in East Java, Indonesia, which consists of 1097 samples, 21 attributes and 2 classes. We used three different feature selection algorithms which are Information Gain, Fisher’s Discriminant Ratio and Chi-square to select the best attributes that have great contribution to the data. We applied Hierarchical K-means Clustering to remove attributes which have lowest contribution. Our experiment showed that only 14 of 21 original attributes have the highest contribution factor of the breast cancer data. The clustering algorithmdecreased the error ratio from 44.48% (using 21 original attributes) to 18.32% (using 14 most important attributes).We also applied the classification algorithm to build the classification model and measure the precision of breast cancer patient data. The comparison of classification algorithms between Naïve Bayes and Decision Tree were both given precision reach 92.76% and 92.99% respectively by leave-one-out cross validation. The information based on our data research, the breast cancer patient in Indonesia especially in East Java must be improved by the treatment routinely in the hospital to get early recover of breast cancer which it is related with adherence of patient.


Download data is not yet available.


Abdelghani B., Erhan G., Predicting Breast Cancer Survivability using Data Mining Techniques, Ninth Workshop on Mining Scientific and Engineering Datasets in conjunction with the Sixth SIAM International Conference on Data Mining, 2006.

Albar ZA, Tjindarbumi D, Ramli M, Lukitto P, Reksoprawiro S,Handojo D,Darwis I, Suardi DR, Achmad D, Protokol Peraboi 2003, Perhimpunan Ahli Bedah Onkologi Indonesia, 2004.

American Cancer Society 2011-2012, Breast Cancer Survival Rates by Stage:Breast Cancer Guidelines, American Cancer Society Breast Cancer Facts&Figures, 2012.

Andri Permana Wicaksono, Tessy Badriyah, Achmad Basuki, Comparison of The Data-Mining Methods in Predicting The Risk Level of Diabetes, EMITTER International Journal of Engineering Technology, Vol. 4, No.1, pp. 164-178, 2016.

Bustami, Penerapan Algoritma Naïve Bayes untuk Mengklasifikasi Data Nasabah Asuransi, TECHSI: JurnalPenelitian Teknik Informatika, Vol.8, No.1, pp.128-146, 2014.

Charu C. Aggarwal, Data Classification: Algorithms and Applications, CRC Press, pp. 1-667, 2014.

Cheng T.Y.,, Cheng M. C., Bor W. C., Prediction of Survival in Patients with Breast Cancer using Three Artificial Intelligence Techniques, Journal of Theoretical and Apllied Information technology, Vol.60, No.1, pp. 179-183, 2014.

Cornain S, Mangunkusumo R, Nasar IM, Pribartono J, Ten Most Frequent Cancers in lndonesia :Pathology based Cancer Registry Data of l988-1989, In:Cancer Registry in Indonesia, National Cancer Registry Center, JakartaCoordinating Board, 1990.

Eko Prasetyo, Data Mining-Mengolah Data Menjadi Informasi Menggunakan Matlab, Andi Offset, Ed.1, pp. 28-30, 2014.

Hadi L. A., Maryam A., Masoud R., Farahnaz S., Prediction of Breast Cancer Survival Through Knowledge Discovery in Databases, Global Journal of Health Science, Vol.7, No.4, pp.392-398, 2015.

International Agency for Research on Cancer, Latest World Cancer Statistics Global Cancer Burden Rises to 14.1 Million New Cases in 2012: Marked Increase in Breast Cancers Must be Addressed, IARC Press ReleaseN° 223 , 2013.

Irawan C, Hukom R, Prayogo N, Factors Associated with Bone Metastasis inBreast Cancer: A Preliminary Study in An Indonesian Population, Acta MedIndones-Indones J Intern Med, Vol.40, No.4, pp.178-180, 2008.

IwanSyarif, Feature Selection of Network Intrusion Data using Genetic Algorithm and Particle Swarm Optimization, EMITTER International Journal of Engineering Technology, Vol. 4, No.2, pp. 277-290, 2016.

Jaree T.,Guandong X., Yanchun Z., Fuchun H., Breast Cancer Survivability via Ada Boost Algorithms, In: Health data and knowledge management: proceedings of the Second Australasian Workshop on Health Data and Knowledge Management (HDKM), Wollongong, NSW, Australia, Vol. 80, pp.55-64, 2008.

Jemal A, Clegg LX, Ward E, Ries LA, Wu X, Jamison PM, Wingo PA, HoweHL, Anderson RN, Edwards BK, Annual Report to The Nation on The Status of Cancer, 1975-2001, with A Special Feature Regarding Survival, Cancer, Vol.101, No.1, pp.3-27, 2004.

Kohei Arai, Ali Ridho Barakbah, Hierarchical K-means: an Algorithm forCentroids Initialization for K-means, Reports of the Faculty of Science andEngineering, Saga University, Japan, Vol.36, No.1, 2007.

Luis Carlos Molina, LluísBelanche, ÀngelaNebot, Feature Selection Algorithms: A Survey and Experimental Evaluation, IEEE International Conference on Data Mining, Maebashi City, Japan, pp. 1 – 19, 2002.

McCready D, Holloway C, Shelley W, Down N, Robinson P, Sinclair S, Mirsky D, Surgical Management of Early Stage Invasive Breast Cancer: A Practice Guideline, Can J Surg, Vol.48, No.3, pp.185-194, 2005.

Ministry of Health Republic of Indonesia, Indonesia Health profile 1993, Jakarta:Departemen Kesehatan RI,1993.

National Breast and Ovarian Cancer Centre, Breast Cancer Risk Factors: A Review of The Evidence, National Breast and Ovarian Cancer Centre, Surry Hills, NSW,Resources for Health Progessionals, 2009.

Ng CH, Pathy NB, Taib NA, Teh YC, Mun KS, Amiruddin A, Evlina S,Rhodes A, Yip CH, Comparison of Breast Cancer in Indonesia and Malaysia—a Clinico-Pathological Study between Dharmais Cancer Centre Jakarta and University Malaya Medical Centre, Kuala Lumpur, Asian Pac J Cancer Prev, Vol.12, No.11, pp.2943-2946, 2011.

Prihartono J, Mangunkusumo R, Partoatmodjo P, Establishing Pathology basedCancer Registry: Indonesian Experience. In: Sasaki R, Aoki K, editors.Epidemiology and Prevention of Cancer. Proceedings of Monbusho (Ministryof Education, Science & Culture) International Symposium on Coparative Study of Etiology & Prevention of Cancer, Nagoya, 1989. Nagoya: The University of Nagoya Press, pp. 211-16, 1990.

R.K. Kavitha, Dorai R, Predicting Breast Cancer Survivability using Naïve Bayes Classifier and C4.5 Algorithm, Elysium Journal, Vol.1, No.1, pp.61-63, 2014.

Sergio Verdu, Fellow, IEEE, Fifty Years of Shannon Theory, IEEE Transactions on Information Theory, Vol.44, No.6, pp.2057-2078, 1998.

Tjindarbumi, Diagnosis dan Pencegahan Kanker Payudara, Kursus Singkat Deteksi Dini dan Pencegahan Kanker, FKUI-POI, Jakarta, 6-8 November, 1995.

Tresna, MaulanaFahrudin, IwanSyarif, Ali RidhoBarakbah, Ant Colony Algorithm for Feature Selection on Microarray Datasets,The Eighteenth International Electronics Symposium (IES)-IEEE co-sponsored conference, Bali, Indonesia, 2016.

Tresna Maulana Fahrudin, Iwan Syarif, Ali Ridho Barakbah, The Determinant Factor of Breast Cancer on Medical Oncology using Feature Selection Based Clustering, The Fifth International Conference on Knowledge Creation and Intelligent Computing (KCIC) 2016-IEEE co-sponsored conference, Manado, Indonesia, 2016.

Wakai K, Dillon DS, Ohno Y, Prihartono J, Budiningsih S, Ramli M, Darwis I,Tjindarbumi D, Tjahjadi G, Soetrisno E, Roostini ES, Sakamoto G, Herman S,Cornain S, Fat Intake and Breast Cancer Risk in An Area Where Fat Intake isLow: A Case-Control Study in Indonesia, Int J Epidemiol, Vol.29, No.1, pp.20-28, 2000.

Zulaiha Ali Othman, Azuraliza Abu Bakar, Abdul Razak Hamdan, Khairuddin Omar, Nor LiyanaMohdShuib, Agent Based Preprocessing, International Conference on Intelligent and Advanced Systems, KL Convention Centre, pp. 219 – 223, 2007.

How to Cite
Fahrudin, T. M., Syarif, I., & Barakbah, A. R. (2017). Data Mining Approach for Breast Cancer Patient Recovery. EMITTER International Journal of Engineering Technology, 5(1), 36-71.