Machine learning identifies ICU outcome predictors in a multicenter COVID-19 cohort

Background Intensive Care Resources are heavily utilized during the COVID-19 pandemic. However, risk stratification and prediction of SARS-CoV-2 patient clinical outcomes upon ICU admission remain inadequate. This study aimed to develop a machine learning model, based on retrospective & prospective clinical data, to stratify patient risk and predict ICU survival and outcomes. Methods A Germany-wide electronic registry was established to pseudonymously collect admission, therapeutic and discharge information of SARS-CoV-2 ICU patients retrospectively and prospectively. Machine learning approaches were evaluated for the accuracy and interpretability of predictions. The Explainable Boosting Machine approach was selected as the most suitable method. Individual, non-linear shape functions for predictive parameters and parameter interactions are reported. Results 1039 patients were included in the Explainable Boosting Machine model, 596 patients retrospectively collected, and 443 patients prospectively collected. The model for prediction of general ICU outcome was shown to be more reliable to predict “survival”. Age, inflammatory and thrombotic activity, and severity of ARDS at ICU admission were shown to be predictive of ICU survival. Patients’ age, pulmonary dysfunction and transfer from an external institution were predictors for ECMO therapy. The interaction of patient age with D-dimer levels on admission and creatinine levels with SOFA score without GCS were predictors for renal replacement therapy. Conclusions Using Explainable Boosting Machine analysis, we confirmed and weighed previously reported and identified novel predictors for outcome in critically ill COVID-19 patients. Using this strategy, predictive modeling of COVID-19 ICU patient outcomes can be performed overcoming the limitations of linear regression models. Trial registration “ClinicalTrials” (clinicaltrials.gov) under NCT04455451. Supplementary Information The online version contains supplementary material available at 10.1186/s13054-021-03720-4.

Few of these reports attempted to identify risk factors predicting morbidity, mortality and overall clinical outcome. This may be the result of the reporting of (1) incomplete data sets earlier in the pandemic as many patient were still undergoing ICU care for SARS-CoV-2 infection [10,13,15,16], and/or (2) data sets biased by the need to triage ICU care to patients in the face of the exhaustion of local/regional ICU capacity [7,10,14,15]. Nonetheless, there was consensus that SARS-CoV-2 ICU patients experienced lengthy ICU stays with ICU mortality in the range of 25 to 41% [14,17]. Classical statistical analysis identified risk factors in these patient populations including age, renal function, the degree of pulmonary compromise and severity of acute respiratory distress syndrome (ARDS). But standard statistical techniques are limited in their ability to integrate diverse data types such as past medical history, therapeutic ICU interventions and many more in relation to clinical outcome variables [18].
To overcome these limitations, we employed machine learning methods to optimize risk stratification and prediction of overall outcomes for individual COVID-19 ICU patients. It has been recently shown that machine learning (ML) algorithms in combination with numerous, multidimensional variables with non-linear relationships may have advantages in clinical outcome prediction. Machine learning strategies were found to be superior to classical methods of outcome prediction typically used in cardiovascular pathologies [18,19]. To take advantage of this superior technique for outcome prediction, we investigated 1186 PCR-confirmed COVID-19 patients receiving ICU care at 27 German hospitals that were enrolled retrospectively and prospectively. The aim of this study is to investigate whether ML can provide additional and interpretable insights for outcome prediction and weigh the identified outcome factors in COVID-19 ICU patients.

Study design, setting and participants
This multi-center retrospective-prospective cohort study was performed with 27 participating German hospitals (Additional file 1: Table E1 and Fig. 1). An ethics approval was obtained from the participating hospitals' Institutional Review Boards. The study was registered in "ClinicalTrials" (clinicaltrials.gov) under NCT04455451. COVID-19 patients 18 years and older requiring ICU admission between 1st January 2020 and 4th May 2021 at a participating center were recruited for this study. Patients were recruited either retrospectively (1st January 2020 to 31st July 2020) or prospectively (29th September 2020 to 4th May 2021). Inclusion criteria were the requirement for ICU treatment due to COVID-19 confirmed by a positive SARS-CoV-2 PCR test. The local investigator confirmed the accuracy and completeness of all entered data. A secure electronic research data capture system (REDCap) was used to collect and manage study data in a pseudonymous fashion [20,21].

Variables and measurements
During the data collection process demographic data, past medical history, previous medications, current illness data, laboratory values as well as outcome data were collected. A total of 49 variables were used for the ML models (Additional file 1: Table E8).
To allow comparability of intubated and spontaneously breathing patients the Sequential Organ Failure Assessment (SOFA) score was calculated without the Glasgow Coma Scale (GCS) [22]. Murray Lung injury score was calculated as previously published [23]. Static compliance and driving pressure were calculated as previously described [24]. Laboratory values were converted to a common unit to permit analysis. Oxygen supply in spontaneously breathing patients was converted to an estimated F i O 2 (Additional file 1: Table E2).

Bias management Discontinuation of ICU care
107 (8.3%) patients, or their legal representative requested that ICU level care be discontinued during the ICU stay. The majority of these patients died during the ICU stay (n = 95, 88.8%). To avoid bias in predictor analyses this patient group was excluded from further analyses (for patient characteristics please see Additional file 1: Table E5). For three patients these data were not available, they were excluded from the analyses.

Statistical analyses
Observed parameters were assessed for their distribution. Outliers were excluded by visual assessment of

Description of machine learning process
Variables are referred to as features in machine learning (ML) but for consistency we will refer to them as variables. For a detailed description of the machine learning process please see Additional file 1: Table E7. We trained Support Vector Classifier (SVC), Random Forest Classifier (RF), and EBM with a fivefold stratified Cross Validation (CV) by using 80% of the data for training and 20% of the data for testing. We excluded variables with more than 30% of data missing (see Additional file 1: Table E7). For all ML-methods, we applied one-hot encoding for categorical data, i.e. creating indicator columns for each category (including missing values). We converted Boolean data to numerical values zero and one. We performed a hyper-parameter optimization across all MLalgorithms with nested CV techniques [25]. Performance of the models was evaluated as the average of balanced accuracy and the area under precision-recall curve (PR-AUC) per fold of CV. A regular accuracy or AUC would be biased towards the overrepresented class ("survival"). In order to verify the robustness of our results in light of the imbalanced outcome variable, we used both oversampling and under-sampling for the outcome "survival". For over-sampling, the observations from the underrepresented class (here: "non-survival") were added at random to the data set. For under-sampling, the overrepresented class (here"survival") was reduced at random to the same size as the underrepresented class. We compared the ranking of variable importance and the shape function with the results from each of the fivefold stratified CV runs on the retrospective dataset. The results of each run were the same (data not shown). We further validated the results by training the ML-models with a fivefold CV for hyper-parameter optimization (RF and SVC) on the retrospective data and predicting the outcome on the prospective data (see Table 2). For the results presented in this paper, we trained the EBM on the entire dataset (retrospective and prospective).

Rationale for the use of the explainable boosting machines model
EBMs are built on a generalized additive model (GAM) of the form where g is the link function and f i (x i ) the shape function for variable x i and ̟ i is the weight for variable x 1 , with which each variable influences the model. In a classification problem, the link function g is a logistic function [26]. As the model is additive, each variable contributes in a modular way. This allows for an easy interpretation about the influence of a variable to the prediction (see Fig. 2A). The idea of using shape functions for each variable allows for complex relationships (even non-linear) between the variable and the outcome prediction (see Fig. 2B). Therefore, GAMs can be significantly more accurate than simple linear models [27]. We use EBMs as they additionally employ modern machine learning techniques such as bagging and boosting and have a comparable performance to state-of-the art ML techniques such as RF [27,28]. Overall performance of the ML models was assessed by balanced accuracy and PR-AUC (Table 2).

Participating centers and level of care
27 ICUs participated in this observational study including 24 ICUs from university hospitals and three ICUs from regional primary and secondary care hospitals (Additional file 1: Table E1, Figure E1). All patients requiring ICU treatment could receive the full treatment possibilities including ventilation, renal replacement g y = ̟ 1 f 1 (x 1 ) + ̟ 2 f 2 (x 2 )+ ::: +̟ p f p x p ,

Patient characteristics and status at ICU admission
1186 patients were recruited into the study (patient selection chart, Additional file 1: Figure E2) with 713 patients in the retrospective and 473 patients in the prospective cohort. Overall patient characteristics, severity of the disease, and organ failure are given in Table 1 and Additional file 1: Table E4. Twice as many males (71.9%) than females (28.1%) were treated at the participating ICUs. The median age was 63 (IQR 54 to 73), 180 patients (15.2%) had an age below 50 years, and 6 patients (0.5%) had an age above 90 years. For age distribution and baseline parameters please see Fig. 1. Kaplan Meier Curves for probability of ICU survival according to patient age are provided in Additional file 1: Figure E3a. At ICU admission spontaneous breathing via oxygen mask, non-invasive assisted ventilation or invasive ventilation were present in 47.2%, 11%, 41.7% patients, respectively. Data for the grading of the ARDS severity were available for 1154 patients (97.3%). According to the Berlin definition ARDS was graded using the P a O 2 /F i O 2 index as mild (16.6%), moderate (47.3%), or severe (28.4%) [29]. Additional file 1: Figure E3b provides the Kaplan Meier Curves for probability of ICU survival according to ARDS severity.

Patient outcome
Overall ICU mortality was 34% for all recruited patients. Median length of ICU stay was 15 days (IQR 7 to 30 days). Mortality was significantly lower in female patients (27.6%) than in male patients (36.5%) (p = 0.0041). Mortality was highest in octogenarians with an observed mortality of 45.7% (Additional file 1: Figure E3a). 22% patients received ECMO therapy (21% in the retrospective cohort and 23.5% in the prospective cohort) with a median duration of 16 days (IQR 9 to 26

Prediction of ICU survival by EBM models
Overall performance of the different ML models including results for balanced accuracies and precision recall area under the curve (PR-AUC) are given in Table 2. The EBM model based on variables reflecting status at ICU admission (Additional file 1: Table E8), resulted in a high precision recall area under the curve (PR-AUC) of 0.81 and a moderate balanced accuracy of 0.64 (Additional file 1: Figure E4a). The ten most important predictive variables in the admission model were according to their predictive importance: age, platelet/neutrophil ratio, D-dimer, Horowitz quotient, hemoglobin, procalcitonin, Murray lung injury score, platelet count, interaction of c-reactive protein and interleukin-6 and absolute lymphocyte count (Fig. 2 Figure E5a).

Predicting the need for ECMO therapy by EBM models
EBM models for the prediction of ECMO therapy resulted in a good PR-AUC of 0.69 and a good balanced accuracy of 0.73. The five most important parameters associated with ECMO therapy according to their predictive importance were: age, ventilatory status "intubated" at ICU admission, admission by external transfer, Murray lung injury score, and admission by internal transfer (reduced risk) (Fig. 3a). The shape function for the factor age showed a higher risk for ECMO therapy below the age of 70 (CI 69 to 75) years. A Murray Lung injury score above a level of 2.8 (no CI) resulted in a higher risk for ECMO therapy. Patients admitted by external transfer had a higher risk to receive ECMO therapy. Comparison of the EBM models and selected shape functions of important variables revealed similar results (Additional file 1: Table E9 and Figure E5b).

Prediction of renal replacement therapy by EBM models
Patients on chronic dialysis were excluded prior to EBM model generation. The EBM model on the complete dataset resulted in a good PR-AUC (Additional file 1: Figure  E4c). The five most important parameters according to their predictive importance were: interaction of age with D-dimer level, creatinine level, SOFA score w/o GCS, interaction of BMI with creatinine, and platelet/neutrophile ratio (Fig. 3b). Patients with an age below approximately 65 years combined with elevated D-dimers had a higher risk for the need of RRT (see heatmap of interaction of age and D-dimers in Fig. 3b). An elevated creatinine level above 1.3 mg/dl (no CI) at ICU admission, as well as a SOFA score w/o GCS above 5 (no CI) resulted in a higher risk to receive RRT during ICU stay. Throughout all EBM models, creatinine and bilirubin levels showed a reverse correlation relationship.

Discussion
In this multi-center retrospective-prospective cohort study we identified and weighed possible predictive factors on COVID-19 outcome using a machine learning approach on 49 variables. Using the present ML approach, we confirmed previously reported factors and extend knowledge to novel factors and factor combinations likely predicting outcome in COVID-19 patients. Shape functions for each of these variables show the individual influence of the variable for the prediction of the outcome. For ICU survival these include age, platelet/ neutrophil ratio, D-dimers, and ARDS severity. The most important factors for the prediction of RRT need include the combination of Age and D-Dimers, Creatinine levels and SOFA score without GCS. Previous studies have shown that older age, obesity, diabetes, being immunocompromised, lower P a O 2 /F i O 2 , higher hemodynamic and renal SOFA score at ICU admission were independently associated with 90-day mortality in COVID-19 [14]. This has also been reported by other investigators, yet they did not show individual cutoff values nor weigh the individual importance for the identified factors [30,31]. To exclude an early effect or a late effect as seen when logistic regression is performed, we included almost all admission variables collected for our cohort. Variable selection influencing outcome can be performed in ML models but is less crucial than for logistic regression. We refrained from such a variable selection in our EBM model's decision process. In our analysis we were able to confirm that age and pulmonary function on admission are important predictors in COVID-19 ICU patients. The present shape functions clearly show a non-linear association between the predictive factors and the outcome variable. Patient's age, for instance, as the most important predictive factor, shows a higher chance for ICU survival below 61 years. Additionally, the ML approach identified the D-dimer level and platelet/neutrophil ratio at ICU admission as important factors. This is especially interesting in the context of reported thrombotic complications of COVID-19 patients [32,33]. When activated, neutrophils complex with platelets to form platelet-neutrophil complexes (PNCs) activating both cell types. These PNCs enhance inflammation, increases neutrophil extracellular trap formation, and result in micro-thrombosis [34,35]. The same is applicable when looking at D-dimer levels. High D-dimer levels reflect an activation of inflammation and the formation of micro-thrombi with neutrophil extracellular trap formation. We can therefore say that our data reflects the inflammatory markers known from translational science and confirm their relevance to outcome [35].
In everyday clinical practice, it is of great interest to assess the further course of patients in intensive care, such as a necessity for renal replacement or ECMO therapy. The present ML model predicting the need for ECMO therapy identified age and pulmonary compromise (Murray lung injury score) as important factors. Admission both from an external hospital and already in an intubated state are associated with the need for ECMO therapy. This result is not surprising, as both younger and more severely pulmonary compromised patients were typically transferred for ECMO therapy to our participating centers [36]. Our ML models assessing the need for RRT include age as an important factor as well as variables quantifying disease severity (SOFA score) or inflammatory and thrombotic activity (D-dimers and Platelet/neutrophil ratio). Our models do not only permit the identification of risk factors in COVID-19 patients, they also provide insights to the weight of each individual variable for the selected ICU outcome of the individual patient [18,37]. The ML models chosen allow for transparent assessment of various variables in a non-linear fashion which overcomes limitations of currently employed regression models. The use of shape functions in GAMs for each variable allows for complex relationships (even non-linear) between the variable and the outcome prediction. Therefore, EBMs can be significantly more accurate than simple linear models [27]. Interactions of different variables extend the analyzing capabilities of the ML approach. Overall, the results from the EBM offer a greater degree of interpretability than a p-value of a linear regression, or an odds ratio analysis. As shown in Figs. 2 and 3 the visualizations offer insight into transition values from positive to negative impact, plateaus, as well as confidence intervals as a certainty measure.
A limitation of the present study is that we were not able to include even more patients into the analysis. This is of course a valid point of criticism, yet the data used for our analyses were manually collected and curated. The data was not simply exported from an electronic medical record where missing data are prevalent and validity of the information has not been confirmed. Missing data often needs to be imputed prior to analysis. As a result of the design of our study, we were largely able to reduce imputation of missing data, again adding to the significance of our findings. The predictiveness of the models presented here differed for the three outcomes (survival, ECMO, RRT). This is likely due to the underlying dataset containing more information for predicting e.g. survival compared to ECMO. Since the study was designed with a focus on predicting survival, some variables which might better predict ECMO or RRT might not have been included in this study (for details see Additional file 1: Table E9). Furthermore, whereas the validation of survival prediction was largely consistent between the retrospective and prospective datasets, there was more variability with regard to ECMO and RRT. A possible reason for this might be structural differences between the retroand prospective datasets, e.g. changes in treatment or age cohort over time. However, the moderate predictive capabilities of the variables used in these ML models leave open the opportunity to add further, even translational technologies for risk prediction in future. A strength of our approach is the ability to determine a weight for individual patient factors with respect to an individual prediction. Additionally, risk factors are presented with a shape function. This allows for a more detailed interpretation and segmentation of risk factors than a simple linear incrementation, as it is the case for the linear regression. Finally, due to the imbalanced dataset (more patients survived ICU therapy, more patients did not need ECMO or RRT), our model is more reliable for predicting "survival" than "mortality". Nonetheless, the strength of these clinical data is the generalizability across institutions and even other similarly resourced countries.

Conclusions
Yet, we present individual risk factors that can be combined for a prediction of "survival" during COVID-19 treatment and ICU course and these factors are weighed for importance. This has been done for the first time and will allow clinicians to weigh clinical criteria for outcome prediction in the patients treated.