A Data-driven Approach to Predict Hospital Length of Stay - A Portuguese Case Study
Event Title
Proceedings of the 16th International Conference on Enterprise Information Systems (ICEIS 2014)
Year (definitive publication)
2014
Language
English
Country
Portugal
More Information
Web of Science®
This publication is not indexed in Web of Science®
Scopus
Google Scholar
This publication is not indexed in Google Scholar
Abstract
Data Mining (DM) aims at the extraction of useful knowledge from raw data. In the last decades, hospitals
have collected large amounts of data through new methods of electronic data storage, thus increasing the
potential value of DM in this domain area, in what is known as medical data mining. This work focuses on
the case study of a Portuguese hospital, based on recent and large dataset that was collected from 2000 to 2013. A data-driven predictive model was obtained for the length of stay (LOS), using as inputs indicators
commonly available at the hospitalization process. Based on a regression approach, several state-of-the-art DM models were compared. The best result was obtained by a Random Forest (RF), which presents a high quality coefficient of determination value (0.81). Moreover, a sensitivity analysis approach was used to extract human understandable knowledge from the RF model, revealing top three influential input attributes: hospital episode type, the physical service where the patient is hospitalized and the associated medical specialty. Such predictive and explanatory knowledge is valuable for supporting decisions of hospital managers.
Acknowledgements
--
Keywords
Medical Data Mining, Length of Stay, CRISP-DM, Random Forest.
Fields of Science and Technology Classification
- Physical Sciences - Natural Sciences