Impact of Principal Component Analysis on the Performance of Machine Learning Models for the Prediction of Length of Stay of Patients

  • Jagriti Jagriti Research Scholar
  • Naresh Sharma Professor
  • Sandeep Aggarwal Assistant Professor
Keywords: machine learning models, length of stay prediction, regression, principal component analysis

Abstract

Patient inflow, limited resources, criticality of diseases and service quality factors have made it essential for the hospital administration to predict the length of stay (LOS) for inpatients as well as outpatients. An efficient and effective LOS prediction tool can improve the patient care and minimize the cost of service by increasing the efficiency of the system through optimal allocation of available resources in the hospital. For predicting patient’s LOS, machine learning (ML) models can have encouraging results. In this paper, five ML algorithms, namely linear regression, k- nearest neighbours, decision trees, random forest, and gradient boosting regression, have been used to predict the LOS for the patients admitted to the hospital with some medical history, laboratory measurements, and vital signs collected before admission. Additionally, the impact of principal component analysis (PCA) has been analyzed on the predictive performance of all ML algorithms. A five-fold cross-validation technique has been used to validate the results of proposed ML model. The results concluded that the RF and GB model performs better with  score of 0.856 and 0.855 respectively among all the ML models without using PCA. However, the accuracy of all the models increased with the PCA except KNN and LR. The GB model when used with principal components has  score and MSE approximate to 0.908 and 0.49 respectively compared to the model that incorporates with the original data. Additionally, PCA has an advantageous effect on the DT, RF and GB models. Therefore, LOS for new patients can be predicted effectively using the proposed tree-based RF and GB model with using PCA.

Downloads

Download data is not yet available.

Author Biography

Naresh Sharma, Professor

Department of Mathematics

School of Engineering and Sciences

References

Oksuzyan A, Höhn A, Pedersen JK, Rau R, Lindahl-Jacobsen R, Christensen K. Preparing for the future: The changing demographic composition of hospital patients in Denmark between 2013 and 2050. PLoS One, Vol.15, pp. 1–12, 2020, doi: 10.1371/journal.pone.0238912. DOI: https://doi.org/10.1371/journal.pone.0238912

Guidet B, van der Voort PHJ, Csomos A. Intensive care in 2050: healthcare expenditure. Intensive Care Med, Vol. 43, pp. 1141–1143, 2017, doi:10.1007/s00134-017-4679-2. DOI: https://doi.org/10.1007/s00134-017-4679-2

Bsbiology VJC, Cristian A. Inpatient Rehabilitation Outcome Mea- sures in Persons With Brain and Spinal Cord Cancer. Cent Nerv Syst Cancer Rehabil 2019.

Morton A, Marzban E, Giannoulis G, Patel A, Aparasu R, Kakadiaris IA. A comparison of supervised machine learning techniques for predicting short-term in-hospital length of stay among diabetic patients. Proc - 2014 13th Int Conf Mach Learn Appl ICMLA 2014 2014; pp. 428–431, 2014, doi:10.1109/ICMLA.2014.76. DOI: https://doi.org/10.1109/ICMLA.2014.76

Mitchell R, Banks C. Emergency departments and the COVID-19 pandemic: Making the most of limited resources. Emerg Med J, Vol. 37, pp. 258–259, 2020, doi:10.1136/emermed-2020-209660. DOI: https://doi.org/10.1136/emermed-2020-209660

Nhdi N Al, Asmari H Al, Thobaity A Al. Investigating indicators of waiting time and length of stay in emergency departments. Open Access Emerg Med Vol. 13, pp. 311–318, 2021, doi:10.2147/OAEM.S316366. DOI: https://doi.org/10.2147/OAEM.S316366

Zhuang Z, Cao P, Zhao S, Han L, He D, Yang L. The shortage of hospital beds for COVID-19 and non-COVID-19 patients during the lockdown of Wuhan, China. Ann Transl Med, Vol. 9, pp. 200–200, 2021, doi:10.21037/atm-20-5248. DOI: https://doi.org/10.21037/atm-20-5248

Baek H, Cho M, Kim S, Hwang H, Song M, Yoo S. Analysis of length of hospital stay using electronic health records: A statistical and data mining approach. PLoS One, Vol. 13, pp.1–16, 2018, doi: 10.1371/journal.pone.0195901. DOI: https://doi.org/10.1371/journal.pone.0195901

Lequertier V, Wang T, Fondrevelle J, Augusto V, Duclos A. Hospital Length of Stay Prediction Methods: A Systematic Review. Med Care, Vol. 59, pp. 929–938, 2021, doi:10.1097/ MLR.0000000000001596. DOI: https://doi.org/10.1097/MLR.0000000000001596

Mittal H, Sharma N. A Probabilistic Model for the Assessment of Queuing Time of Coronavirus Disease (COVID-19) Patients using Queuing Model. Int J Adv Res Eng Technol., Vol.11, pp. 22–31, 2020, doi:10.34218/IJARET.11.8.2020.004.

Khosravizadeh O, Vatankhah S, Bastani P, Kalhor R, Alirezaei S, Doosty F. Factors affecting length of stay in teaching hospitals of a middle-income country. Electron Physician, Vol. 8, pp. 3042–3047, 2016, doi:10.19082/3042. DOI: https://doi.org/10.19082/3042

Maulud D, Abdulazeez AM. A Review on Linear Regression Comprehensive in Machine Learning. J Appl Sci Technol Trends, Vol.1, pp.140–147, 2020, doi:10.38094/jastt1457. DOI: https://doi.org/10.38094/jastt1457

Uddin S, Haque I, Lu H, Moni MA, Gide E. Comparative performance analysis of K-nearest neighbour (KNN) algorithm and its different variants for disease prediction. Sci Rep., Vol. 12, pp.1–11, 2022, doi:10.1038/s41598-022-10358-x. DOI: https://doi.org/10.1038/s41598-022-10358-x

Nsenge Mpia H, Kasolen MK, Baraka VM, Inipaivudu Baelani N. Stacking Regression-Based Model for Predicting Patient’s Length of Stay in a Semi Urban Hospital. Int J Res Publ. Rev., Vol. 04, pp. :273–285, 2023, doi:10.55248/gengpi.2023.4212. DOI: https://doi.org/10.55248/gengpi.2023.4212

Biau∗ G. Analysis of a Random Forests Model. J Of Machine Learn Res., Vol.13, pp. 1063–1095, 2012.

Wu Y. Linear regression in machine learning. Anal Vidhya, Vol. 161, 2022, doi:10.1117/12.2628053. DOI: https://doi.org/10.1117/12.2628053

Timbers T, Trevor C, Lee M, Peng R. Chapter 7 Regression I: K-nearest neighbors | Data Science. Chapter 7 Regres I K-Nearest Neighbors | Data Sci n.d. https://datasciencebook.ca.

Goantiya R. Tree Based Modeling Techniques Applied to Hospital Length of Stay. Rochester Inst Technol., Vol. 81, 2018.

Ali J, Khan R, Ahmad N, Maqsood I. Random forests and decision trees. IJCSI Int J Comput. Sci Issues Vol. 9, pp. 272–278, 2012.

Aziz N, Akhir EAP, Aziz IA, Jaafar J, Hasan MH, Abas ANC. A Study on Gradient Boosting Algorithms for Development of AI Monitoring and Prediction Systems. 2020 Int Conf Comput Intell ICCI 2020 pp.11–16, 2020, doi:10.1109/ICCI51257.2020.9247843. DOI: https://doi.org/10.1109/ICCI51257.2020.9247843

Zhang C, Cao L, Romagnoli A. On the feature engineering of building energy data mining. Sustain Cities Soc., Vol. 39, pp. 508–518, 2018, doi:10.1016/j.scs.2018.02.016. DOI: https://doi.org/10.1016/j.scs.2018.02.016

Sophian A, Tian GY, Taylor D, Rudlin J. A feature extraction technique based on principal component analysis for pulsed Eddy current NDT. NDT E Int., Vol. 36, pp. 37–41, 2003, doi:10.1016/S0963-8695(02)00069-5. DOI: https://doi.org/10.1016/S0963-8695(02)00069-5

Rodríguez JD, Pérez A, Lozano JA. Sensitivity Analysis of k-Fold Cross Validation in Prediction Error Estimation. IEEE Trans Pattern Anal Mach Intell, Vol. 32, pp. 569–575, 2010, doi:10.1109/TPAMI.2009.187. DOI: https://doi.org/10.1109/TPAMI.2009.187

Binieli M. Machine learning: an introduction to mean squared error and regression lines, pp. 1–21, 2020.

Chicco D, Warrens MJ, Jurman G. The coefficient of determination R-squared is more informative than SMAPE, MAE, MAPE, MSE and RMSE in regression analysis evaluation. PeerJ Comput Sci., Vol, 7, pp.1–24, 2021, doi:10.7717/PEERJ-CS.623. DOI: https://doi.org/10.7717/peerj-cs.623

Gutierrez JMP, Sicilia MA, Sanchez-Alonso S, Garcia-Barriocanal E. Predicting Length of Stay across Hospital Departments. IEEE Access, Vol.9, pp. 44671–44680, 2021, doi:10.1109/ ACCESS.2021.3066562. DOI: https://doi.org/10.1109/ACCESS.2021.3066562

Andersson O. Predicting Patient Length Of Stay at Time of Admission Using Machine Learning. Stock SWEDEN 2019.

Gentimis T, Alnaser AJ, Durante A, Cook K, Steele R. Predicting hospital length of stay using neural networks on MIMIC III data. Proc - 2017 IEEE 3rd Int Conf Big Data Intell Comput n.d., pp. 1194–1201, 2017, doi:10.1109/DASC-PICom-DataComCyberSciTec.2017.191. DOI: https://doi.org/10.1109/DASC-PICom-DataCom-CyberSciTec.2017.191

Hijry H, Olawoyin R. Application of machine learning algorithms for patient length of stay prediction in emergency department during hajj. Proc Annu Conf Progn Heal Manag Soc PHM 2020, June 2020, doi:10.1109/ICPHM49022.2020.9187055. DOI: https://doi.org/10.1109/ICPHM49022.2020.9187055

Bacchi S, Tan Y, Oakden-Rayner L, Jannes J, Kleinig T, Koblar S. Machine learning in the prediction of medical inpatient length of stay. Intern Med J Vol. 2022, pp. 52:176–185, doi:10.1111/imj.14962. DOI: https://doi.org/10.1111/imj.14962

Naemi A, Schmidt T, Mansourvar M, Ebrahimi A, Wiil UK. Quantifying the impact of addressing data challenges in prediction of length of stay. BMC Med Inform Decis Mak Vol. 21, pp. 1–13, 2021, doi:10.1186/s12911-021-01660-1. DOI: https://doi.org/10.1186/s12911-021-01660-1

Siddiqa A, Zilqurnain Naqvi SA, Ahsan M, Ditta A, Alquhayz H, Khan MA, et al. Robust length of stay prediction model for indoor patients. Comput Mater Contin., Vol. 70, pp. 5519–5536, 2022, doi:10.32604/cmc.2022.021666. DOI: https://doi.org/10.32604/cmc.2022.021666

Aghajani S, Kargari M. Determining Factors Influencing Length of Stay and Predicting Length of Stay Using Data Mining in the General Surgery Department. Hosp Pract Res., Vol. 1, pp. 51–56, 2016, doi:10.20286/hpr-010251. DOI: https://doi.org/10.20286/hpr-010251

López-cheda A, Jácome M, Cao R, Salazar PM De. Estimating lengths-of-stay of hospitalised COVID-19 patients using a non-parametric model: a case study in Galicia ( Spain ), 2021. DOI: https://doi.org/10.1101/2020.09.04.20187963

Chen Y. Prediction and Analysis of Length of Stay Based on Nonlinear Weighted XGBoost Algorithm in Hospital. J Healthc Eng 2021;2021, doi:10.1155/2021/4714898. DOI: https://doi.org/10.1155/2021/4714898

MEKHALDI RN, CAULIER P, CHAABANE S, CHRAIBI A, PIECHOWIAK S. A comparative study of machine learning models for predicting length of stay in hospitals. J Inf Sci Eng., Vol. 37, pp.1025–1038, 2021, doi:10.6688/JISE.202109_37(5).0003.

Adawiyah R, Badriyah T, Syarif I, Rabiatul Adawiyah, Badriyah T, Syarif I. Hospital Length of Stay Prediction based on Patient Examination Using General features. Emit Int J Eng Technol., Vol. 9, pp. 169–181, 2021, doi:10.24003/emitter.v9i1.609. DOI: https://doi.org/10.24003/emitter.v9i1.609

Wan Z, Xu Y, Šavija B. On the use of machine learning models for prediction of compressive strength of concrete: Influence of dimensionality reduction on the model performance. Materials (Basel), Vol.14, pp.1–23, 2021, doi:10.3390/ma14040713. DOI: https://doi.org/10.3390/ma14040713

Gupta I, Sharma V, Kaur S, Singh AK. PCA-RF: An Efficient Parkinson’s Disease Prediction Model based on Random Forest Classification 2022.

Choudhury A. Hospital Length of Stay Dataset Microsoft 2022. https://www.kaggle.com/datasets/aayushchou/hospital-length-of-stay-dataset-microsoft.

Fan C, Chen M, Wang X, Wang J, Huang B. A Review on Data Preprocessing Techniques Toward Efficient and Reliable Knowledge Discovery from Building Operational Data., Front, Vol. 9, pp.1–17, 2021, doi:10.3389/fenrg.2021.652801. DOI: https://doi.org/10.3389/fenrg.2021.652801

Yusuf AB, Dima RM, Aina SK. Optimized Breast Cancer Classification using Feature Selection and Outliers Detection. J Niger Soc Phys Sci., Vol. 3, pp. 298–307, 2021, doi:10.46481/jnsps.2021.331. DOI: https://doi.org/10.46481/jnsps.2021.331

Gulati A. Dealing with Outliers Using the IQR Method - Analytics Vidhya. Anal Vidhya 2022.

Pei J, Lin X, Chen Q. Prediction of Patients ’ Length of Stay at Hospital During COVID-19 Pandemic Prediction of Patients ’ Length of Stay at Hospital During COVID-19 Pandemic, pp. 0–10, 2021, doi:10.1088/1742-6596/1802/3/032038. DOI: https://doi.org/10.1088/1742-6596/1802/3/032038

Bhandari A. Feature Engineering: Scaling, Normalization, and Standardization (Updated 2023). Anal Vidhya, Vol. 03, Apr 2020.

Cha GW, Choi SH, Hong WH, Park CW. Developing a Prediction Model of Demolition-Waste Generation-Rate via Principal Component Analysis. Int J Environ Res Public Health, Vol. 20, 2023, doi:10.3390/ijerph20043159. DOI: https://doi.org/10.3390/ijerph20043159

Yao L. Improved Models for Diabetes Prediction by Integrating PCA Technique, Vol. 47, pp. 106–115, 2023. DOI: https://doi.org/10.54097/hset.v47i.8172

Mekhaldi RN, Caulier P, Chaabane S, Chraibi A, Piechowiak S. Using Machine Learning Models to Predict the Length of Stay in a Hospital Setting. World Conf Inf Syst Technol., Vol. 1159, pp. 202–211, 2020, doi:10.1007/978-3-030-45688-7_21. DOI: https://doi.org/10.1007/978-3-030-45688-7_21

Chuang M Te, Hu YH, Lo CL. Predicting the prolonged length of stay of general surgery patients: a supervised learning approach. Int Trans Oper Res., Vol. 25, pp.75–90, 2018, doi:10.1111/itor.12298. DOI: https://doi.org/10.1111/itor.12298

Abd-Elrazek MA, Eltahawi AA, Elaziz MHA, Abd-Elwhab MN, Abd Elaziz MH, Abd-Elwhab MN. Predicting length of stay in hospitals intensive care unit using general admission features. Ain Shams Eng J., Vol.12, pp. 3691–3702, 2021, doi:10.1016/j.asej.2021.02.018. DOI: https://doi.org/10.1016/j.asej.2021.02.018

Published
2024-12-20
How to Cite
Jagriti, J., Sharma, N., & Aggarwal, S. (2024). Impact of Principal Component Analysis on the Performance of Machine Learning Models for the Prediction of Length of Stay of Patients. EMITTER International Journal of Engineering Technology, 12(2), 128-149. https://doi.org/10.24003/emitter.v12i2.835
Section
Articles