Deploying Machine Learning to Validate Clinical Phenotypes and Factors Associated with Mortality risk in 2,022 Critically Ill Patients with COVID-19 in Spain

Background: The identication of factors associated with Intensive Care Unit (ICU) mortality and derived clinical phenotypes in COVID-19 patients could help for a more tailored approach to clinical decision-making that improves prognostic outcomes. Methods: Prospective, multicenter, observational study of critically ill patients with conrmed COVID-19 disease and acute respiratory failure admitted from 63 Intensive Care Units(ICU) in Spain. The objective was to analyze patient’s factors associated with mortality risk and utilize a Machine Learning(ML) to derive clinical COVID-19 phenotypes. Patient features including demographics and clinical data at ICU admission were analyzed. Generalized linear models were used to determine ICU morality risk factors. An unsupervised clustering analysis was applied to determine presence of phenotypes. The prognostic models were validated and their performance was measured using accuracy test, sensitivity, specicity and ROC curves. Results: The database included a total of 2,022 patients (mean age 64[IQR5-71] years, 1423(70.4%) male, median APACHE II score (13[IQR10-17]) and SOFA score (5[IQR3-7]) points. The ICU mortality rate was 32.6%. Of the 3 derived phenotypes, the C(severe) phenotype was the most common (857;42.5%) and was characterized by the interplay of older age (>65 years), high severity of illness and a higher likelihood of development shock. The A(mild) phenotype (537;26.7%) included older age (>65 years), fewer abnormal laboratory values and less development of complications and B (moderate) phenotype (623,30.8%) had similar characteristics of A phenotype but were more likely to present shock. Crude ICU mortality was 45.4%, 25.0% and 20.3% for the C, B and A phenotype respectively. The ICU mortality risk factors and model performance differed between whole population and phenotype classications. Conclusion: The presented ML model identied three clinical phenotypes that signicantly correlated with host-response patterns and ICU mortality. Different risk factors across the whole population and clinical phenotypes were observed which may limit the application of a “one-size-ts-all” model in practice. and not scientically supported. Finally, Gattinoni L et al. 23 proposing two phenotypes for COVID-19 patients, 1) “Type L” characterized by high compliance and low lung recruitablity and 2) “Type H” with low compliance and high lung recruitability as a two “extremes” of a spectrum of respiratory failure in COVID-19 pneumonia. Despite the importance related to clinical experience in each of these approaches, none of these studies have been developed through a machine learning process to determine phenotypes nor have they been tested for validation.


Introduction
Since the outbreak of COVID-19 disease began in December 2019 in China, soaring cases of con rmed SARS-CoV-2 are pummeling the global health system. More than 61 million people have developed SARS-CoV-2 infection, and more than 1 million have died 1 . The current diagnostic and monitoring procedure rely on an inaccurate and often delayed identi cation by the clinician who then decides the treatment upon interaction with the patient. Critical illness from COVID-19 has constrained intensive care unit (ICU) material and human resources 2 . As of November 30, 2020 more then 1.6 million people in Spain have been infected with SARS-CoV-2 and more than 40,000 have died 3 . Short-term mortality reported rate ranges from 16% to 62% of patients admitted to ICU 4-8 . The heterogeneity of patients that have been treated in China 4 , Italy 8 , USA [5][6][7] or Spain 9,10 may explain the wide variation of mortality rate due to their population characteristics, presence of comorbidities and healthcare systems. A recent international survey 11 reported signi cant practice variations in the management of severe COVID-19 patients, including differences at the regional, hospital, and patient level. Therefore, it is necessary to characterize phenotypes, by extending the enrolment of patients outside of one ICU site to multiple patients being treated in different hospitals. Allowing to adequately measure mortality-related factors adjusted by the inter-hospital variation to determine clinical outcomes.
Risk factors represent the most important approach when de ning treatment of hospitalized patients as these measures can inform clinical courses most likely for a patient given their a priori risk. However, risk factors can also interplay differently when they are included in different patient clusters. A single model based on general risk factors (one-size-ts-all) might be limited for clinical data interpretation and application across sites. Different combinations of risk factors may naturally cluster into previously undescribed subsets or phenotypes that may have different risks for a high mortality rate and that may therefore help to determine the response to treatments in COVID- 19. We hypothesize that the presence of well-de ned phenotypes in COVID-19 could help to more appropriately identify patients at risk of ICU mortality than general models for the entire population considering that this disease results in a constellation of symptoms, laboratory derangement, immune dysregulation, and clinical complications.
The primary objective was to assess which factors are independently associated with ICU mortality. The second objective was to determine the presence of distinct clinical phenotypes using unsupervised clustering methods that were applied to the datasets available on ICU admission. The added value of this large-scale multicenter prospective study lies to discover 3 phenotypes based on clinical data available at ICU admission that can help explain the variation in clinical results of COVID-19 disease in the ICU.

Material And Methods
Study design: A multicenter observational, prospective cohort study that consisted of a large-scale data source of hospital ICU admissions and patient-level clinical data. The enrolment criteria included adult' patients with laboratory con rmed SARS-CoV-2 infection admitted in 63 ICUs across Spain due to acute respiratory failure between February 22, 2020 andMay 11, 2020. The study was approved by the reference institutional review board at Joan XXIII University Hospital (IRB# CEIM/066/2020) and each participating site with a waiver of informed consent. All data values were anonymized prior to the phenotyping which consisted of clustering clinical variables on their association with COVID-19 mortality.
Study Sites and Patients Population: The study enrolled consecutive adult patients (>16 years) with laboratory con rmed SARS-CoV-2 infection, detected by RT-PCR positive test of nasopharyngeal, oropharyngeal swab or invasive respiratory samples according to the WHO recommendations. 12 The follow-up of patients was scheduled until August 11, 2020, which con rmed ICU discharge or death whichever occurred rst. A complete list of participating ICUs and their investigators is provided in the acknowledgements section. In this cohort, 43 patients were described in a preliminary report of a single-center case series in Tarragona, Spain 9 .
Outcomes: The primary outcome included all-causes of ICU mortality. Patients who were discharged alive from ICU were evaluated in the data as alive considering mortality was de ned as any in-ICU death. All complications and outcomes were followed during ICU admission.
Data collection: Data was obtained from a voluntary registry created by Spanish Society of Intensive Care Medicine-SEMICYUC. All consecutive cases admitted to the ICU were collected. There were no patients excluded from the analysis that was enrolled to participating ICU and met criteria.
All the collected variables recorded at ICU admission are listed in the additional le (P 8-9). To determine severity of illness, the Acute Physiology and Chronic Health Evaluation (APACHE) II score 13 and Sequential Organ Failure Assessment (SOFA) scoring 14 were calculated for all patients within the rst 24 h of ICU admission.
The ICU admission criteria considered the use of antiviral, antibiotic or co-adjuvant treatment for all patients, and also the measures that would determine the need to intubate and type of ventilator support required. These measures were not standardized between centers and were left to the discretion of the attending physician, according to SEMICYUC and National Ministry of Health 15 and were included in the case report form and con rmed by the medical records. We also collected hospital-level data including city, county and number of hospital beds available. The study de nitions used in the present analysis are shown in the additional le (P 5).

Statistical analysis
No statistical sample size calculation was performed a priori, and sample size was equal to the number of patients admitted to the participant's ICUs with con rmed COVID-19 during the study period. To describe baseline characteristics, the continuous variables were expressed as median (interquartile range [IQR]) and categorical variables as number of cases (percentage). For patient demographics and clinical characteristics, differences between groups were assessed using the chi-squared test and Fisher's exact test for categorical variables, and the Mann-Whitney U or Wilcoxon test for continuous variables. To performed the analysis, we rst assessed the candidate variables, missing values, and correlation. Multiple imputation was used to account for missing data (additional le, P 5). After evaluating correlation, highly correlated variables were excluded (additional le, P 8).
An overview of the primary analysis plan is outlined in Figure 1. In a rst step, a multilevel conditional logistic modelling and the intraclass correlation coe cient (ICC) was calculated (additional le, P 6) with patients nested in hospital to characterize hospital-level variation of ICU mortality and determine if a signi cant inter-hospital variation is present.
In a second step, a multivariate analysis (GLM: Generalized linear Regression model) was performed to investigate the association between baseline (on ICU admission) variables and ICU-mortality. The GLM model comprised factors of clinical interest and all signi cant covariates (p<0.05) in the univariate analysis of ICU mortality and presence of collinearity was studied by variance in ation factors (VIF). The results are presented as odds ratios (OR) and 95% con dence intervals (CI).
To determine our model, we checked adequate model performance between groups with a cross validation model (K-fold=10) and the model with better performance was chosen. For the model validation, database was randomly split into two subsets: (a) a "training set" (80%), and (b) a "validation set" (20%). Model performance was examined using accuracy test, precision, sensitivity, speci city and area under ROC curve (AUC).
Lastly, to determine presence of distinct clinical phenotypes in our population of COVID-19 patients, an unsupervised clustering analysis was applied to the database at ICU admission. In order to carry out this analysis, a discretization of the numerical variables into categorical ones was done. The information provided by each variable regarding ICU mortality was de ned using the Information Value (IV). A IV greater than 0.03 was considered clinically important and this variable was included in the multivariate logistic regression analysis. Model performance was examined using accuracy test, Sensitivity, Speci city and AUC modeling. Subsequently, the unsupervised cluster analysis was performed using the important variables. The Podani distance was used to calculate the distance between patients and the "partition around medoids" (PAM) algorithm to perform the clustering 16 . The optimal number of clusters were determined after studying the silhouette 17 and the PAM objective for different numbers of clusters (additional le, P 15). Each of these clusters represent a speci c patient's phenotype. To visualize the clusters in a lower dimensional space, we used a Principal Component Analysis (PCA). We obtain important variables (IV) for each phenotype, and the OR of these variables were obtained after applying a GLM analysis. Multinomial regression models were t to further compare patient comorbidities across phenotype classi cation. Model performance in each phenotype was examined using accuracy test, Sensibility, Speci city and AUC. Data analysis was performed using R software (cran.r-project.org).

Patients Characteristics at ICU admission
From February 29, 2020 to June 11, 2020 a total of 2,022 critically ill patients from 63 ICUs were enrolled in the present analysis. Forty percent of ICUs belonged to hospitals with more than 500 beds, 40% to hospitals between 200 and 500 beds and the remaining 20 percent to hospitals with fewer than 200 beds.

ICU mortality
Overall, 660 patients (32.6%) died. The crude ICU mortality increased signi cantly with the increase in prede ned age cut-off and was greater than 80% in patients over 80 years old. (additional le, P 10). Age, male sex, severity of illness (APACHE II and SOFA), presence of arterial hypertension, diabetes, coronary arterial disease, chronic obstructive pulmonary disease (COPD), chronic kidney disease (CKD), immunosuppression and hematologic disease markers were signi cantly higher in 'non-survivor' patients. Non-survivor patients compared with those that survived had higher levels of D-Lactate dehydrogenase (LDH), white blood cells, serum creatinine, C-reactive protein (CRP), Procalcitonin (PCT), serum lactate, serum D-dimer and serum ferritin. Non-survivor patients developed more frequent complications such as shock, kidney and myocardial dysfunction at ICU admission. High Flow Nasal Cannula (HFNC) was more frequent in survivors, while invasive mechanical ventilation (MV) was more common in non-survivors. Mortality for those who received MV during their ICU stay (n= 1554; 76.8%) was 37.3% (n=580) higher than observed in patients who did not require MV 17.0% (80/468, p<0.001). Complete characteristics of patients according outcome are shown in Table 2.

Construction of the ICU Mortality multivariate model
Of the 42 variables measured at ICU admission, 25 variables that were statistically signi cant in the univariate analysis (Table 2) were included in the model. The initial dataset of patients was randomly split in two subgroups "Training group" with 1,618 patients (80%) and "Test group" with 404 patients (20%). The characteristics of patients included into each subgroup are shown in Table 1. No signi cant differences were observed between the subgroups. Inclusion of these 25 variables in a GLM model for the training group, resulted in 10 variables that were independently associated with ICU mortality (Figure 2). No presence of collinearity between explanatory variables was observed (additional le, P 11) and the Hosmer-Lemeshow Goodness-of-t test (X-squared = 5.53, df = 8, pvalue = 0.69) established no discrepancy between the observed values and those that would have been expected in the model. The validation of model in the test group demonstrated adequate performance with an accuracy of 0.78, a precision test of 0.73, sensitivity of 0.88, speci city of 0.45 and an AUC ROC of 0.82 (95%CI 0.78-0.86) (additional le, P 11).
Unsupervised analysis (cluster) to determine different phenotypes in critically ill patients Once the variables were categorized, 5 patients (0·24%) were excluded for outlier's data, and the analysis was performed with 2,017 patients. Of the 50 variables considered, only 25 were considered as predictors according to the IV (additional le, P 12) and were included in the model. Remarkably, no treatment option was a predictive factor for ICU-mortality. The categorized variables independently associated with ICU-mortality are shown in additional le (P 14). The performance of the model was adequate with an accuracy of 0.77, sensitivity of 0.88, speci city of 0.54 and AUC of 0.82. According to the Podani's distance and the Shilouette and PAM plots (additional le, P 15) the optimal number of clusters in our dataset was 3. Cluster A included 537 patients (26.7%), cluster B included 623 (30.8%) and cluster C included 857 patients (42.5%). The clusters in a lower dimensional space are shown in the additional le (P 16). The size and characteristics of the phenotypes in the 3-class model appear in the additional le (P 13). Patients with the cluster A phenotype (mild COVID-19 disease) had <65 years, lower severity of illness, fewer abnormal laboratory values and less development of complications, with a crude ICU mortality of 20.3%; those with the cluster B phenotype (moderate COVID-19 disease) had similar characteristics as seen in the A phenotype but were more likely to present shock at ICU admission with a crude ICU-mortality of 25.5%. Patients with the cluster C phenotype (severe COVID-19 disease) had >65 years, a high level of severity of illness, more likely to have elevated measures of in ammation (e.g. D dimer, LDH and ferritin), high frequency of shock, AKI and myocardial dysfunction, with a crude ICU mortality of 45.4%. The clinical characterization of each observed phenotype can be seen in Figure 3. By including these important variables (additional le, P 17) in a regression model for each cluster, we observed that the discrimination of each model was higher than general model except for C phenotype (Table 3). The Variables independently associated with mortality were different between global and cluster models ( Figure 4 and additional le, P [18][19]

Discussion
The main nding of our study is that among patients with COVID-19, 3 clinical phenotypes were derived using habitual clinical and laboratory variables at ICU admission. The ability of identifying phenotypes using a small set of variables is a crucial step towards clinical application and has important implications for possible differential treatment guided by phenotypes and validated prognostic scoring systems 18,19 .
Our C phenotype was associated with more than double the ICU mortality than each of the remaining two phenotypes. This C phenotype was characterized by the interplay of older age (>65 year), a high severity (APACHE II > 15 and SOFA >5), greater burden of risk factors (hematologic disease and coronary disease) and a higher likelihood of developing further complications (shock and AKI).
Previous studies have implemented clustering techniques to analyze various data sources relating to demographic, geographic, environment, and socioeconomic determinants of health and disease. There are studies that have evaluated treatment decisions and characterized clinical phenotypes associated with complications, ICU admission and mortality risk in critically ill COVID-19 patients.
According to the Situation Report & Public Health Guidance published by Johns Hopkins University on March 19th, 2020, people over 60 and those with chronic health conditions are at the highest risk for COVID-19 complications 20 . To our knowledge, this is the rst study with a high number of critically ill patients to analyze the presence of phenotypes in patients with SARS-CoV-2 infection. This multicenter cohort study of 2,022 critically ill patients found that 660 patients (32.6%) died at ICU discharge. Our ICU mortality rates was signi cantly lower as reported in Yang X et al. 4 in Wuhan, China (61.5%), by Myers L et al 7 in California, USA (50.0%), by Arentz M et al. 6 in Washington, USA (67%) and by Richardson S et al. 5 in New York, USA (78%), but slightly higher to reported by Grasselli G et al 8 . in Lombardy region, Italy (26%). These observed differences in ICU mortality could respond to different healthcare models and important practice variations in the management of severe COVID-19 patients 11 , but it can also depend on the frequency of presentation of the different phenotypes.
In our study, a great variability in model performance and risk factors were observed during cross-validation to choose the best model to use. This might be related to the presence of a very heterogeneous population of patients, which is revealed during random partitioning performed (80%/20%) for validation of each model. In this context, three clinical phenotypes of COVID-19 patients were derived using routinely available clinical data at ICU admission by an unsupervised cluster analysis. The phenotypes were multidimensional, differed in their demographics, laboratory abnormalities, patterns of organ dysfunction, and associated with ICU mortality. In addition, the phenotypes were not homologous with traditional patient groupings such the presence of clinical complications 21,22 , or type of ARDS 23 .
Our COVID-19 phenotypes can be identi ed at the time of the ICU admission, and thus could be useful in facilitating early tailored therapy and improve prognosis.
Only routinely available clinical and laboratory data were used in the clustering models, and the phenotypes were derived from a large observational multicenter cohort to ensure generalizability. Importantly, we have observed that the variables associated with the ICU mortality varied between the global model and the models developed for each phenotype. The discrimination power (AUC) of A and B phenotypes models improved in comparison to the global model. However, for the C phenotype (severe COVID-19 disease), the performance of the model was not superior respect of the global model. The C phenotype was most strongly correlated with abnormal values of biomarkers as well as clinical features of cardiovascular dysfunction, AKI and subsequently a higher ICU mortality. Recently, several authors have proposed different clinical phenotypes of COVID-19 patients 21-23 . Rello J et al. 21 speculated that COVID-19 has ve phenotypic presentations based on physiological and clinical features from published studies. Garcia-Vidal et al. 22 describe the main clinical complications of hospitalized patients with COVID-19 through classi cation into three pattern groups (in ammatory, co-infection and thrombotic). However, as the authors acknowledge, the cut-off points of the different biomarkers for de ning phenotypes have been arbitrary and not scienti cally supported. Finally, Gattinoni L et al. 23 proposing two phenotypes for COVID-19 patients, 1) "Type L" characterized by high compliance and low lung recruitablity and 2) "Type H" with low compliance and high lung recruitability as a two "extremes" of a spectrum of respiratory failure in COVID-19 pneumonia. Despite the importance related to clinical experience in each of these approaches, none of these studies have been developed through a machine learning process to determine phenotypes nor have they been tested for validation.
Although hypoxemia is a proposed severity marker for phenotype differentiation 21,23 , in our study the PaO2/FiO2 ratio to ICU admission was an independent risk factor for ICU mortality in the multivariate analysis but with a very small protector effect (OR=0·99) and only was closely associated with ICU mortality in C phenotype. Other variables such as advanced age, SOFA score, development of AKI or need for mechanical ventilation were more strongly related to ICU mortality than PaO2/FiO2 at ICU admission.
Our results should be interpreted in the context of the study limitations. First, although phenotypes were found to be generalizable in our population, risk factors and characteristics of clinical phenotypes were derived initially from data at ICU admission of multicenter observational study in Spain. However, these risk factors are similar to those that have been reported by other investigators 4-8 . The cross-validation carried out and the high discrimination observed for each of the models built for phenotypes, suggests their applicability to other populations, but it should be examined considering the high variability observed in patients with COVID-19 and in the support measures applied. Second, because missing data were common for some variables included in the clustering models, multiple imputation was used in the primary analysis. However, variables with high missing values were excluded and the missing threshold used was reported elsewhere 18 . Third, only routinely available clinical data at ICU admission were used to identify risk factors and clinical phenotypes, and the inclusion of other data related to clinical evolution of patients in the ICU could change risk factors or phenotype assignments. However, our objective was to study early risk factors and phenotypes at ICU admission that may allow for early treatment implementation and as a result improve patient outcome.

Conclusion
To our knowledge this is the largest study that describe different phenotypes of patients with con rmed COVID-19 that were admitted to ICU to date. We not only characterized three novel clinical phenotypes, but extended ndings outside of a single site ICU by characterizing the association of comorbidities with clinical phenotype and the association of clinical phenotypes with clinical outcomes. Different risk factors for the global population and clinical phenotypes were observed, possibly due to the heterogeneity of patients, which may limit the application of a single predictive model for all patients with COVID-19. Further research is needed to determine the application of these phenotypes in clinical practice and for clinical trial design. The anonymized database collected for the study by the SEMICYUC, and the data dictionary that de nes each eld in the set, will be made available to reviewers if they consider it necessary prior con dentiality agreement.

-Competing interests
All authors declare that they have no con icts of interest -Funding This study was supported by the Spanish Intensive Care Society (SEMICYUC) and Ricardo Barri Casanovas Foundation. The study sponsors have no role in the study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author (AR) had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. All authors approved the nal version of the manuscript. The views expressed in this article are those of the authors and not necessarily those of the SEMICYUC.  Table 1 Characteristics of 2022 critically ill patients included in the study at ICU admission according to train or test population  b APACHE II score to the severity of illness, the score is obtained by adding the following components 1) 12 clinical and laboratory variables each with a score range of 0 to 4 points (APS). The APS is determined from the worst physiologic values during the initial 24 h after ICU admission, 2) age with a range 0 to 6 points and 3) Chronic health points if the patients has history of severe organ system insu ciency or is immunocompromised assign 5 points if the patients is no operative or emergency postoperative and 2 points for elective postoperative patients with a total score range of 0 to 71. c SOFA score corresponds to the severity of organ dysfunction, re ecting six organ systems each with a score range of 0 to 4 points (cardiovascular, hepatic, hematologic, respiratory, neurological, renal), with a total score range of 0 to 24,  b APACHE II score to the severity of illness, the score is obtained by adding the following components 1) 12 clinical and laboratory variables each with a score range of 0 to 4 points (APS). The APS is determined from the worst physiologic values during the initial 24 h after ICU admission, 2) age with a range 0 to 6 points and 3) Chronic health points if the patients has history of severe organ system insu ciency or is immunocompromised assign 5 points if the patients is no operative or emergency postoperative and 2 points for elective postoperative patients with a total score range of 0 to 71. c SOFA score corresponds to the severity of organ dysfunction, re ecting six organ systems each with a score range of 0 to 4 points (cardiovascular, hepatic, hematologic, respiratory, neurological, renal), with a total score range of 0 to 24, d De ned as a body mass index (calculated as weight in kilograms divided by height in meters squared) of 30 or greater. e Baseline eGFR < 60 on at least two consecutive values at least 12 weeks apart prior or hemodialysis f included acute leukemia, myelodysplastic syndrome and Lymphomas. g According to the New York Heart Association (NYHA) Functional Classi cation III and IV h included Chronic corticosteroid treatment (> 20 mg prednisolone /day or equivalent dose), chemotherapy or therapy with biological agents I De ned as patients in whom adequate uid resuscitation therapy are unable to restore hemodynamic stability and need any dose of vasopressor drugs.
j De ne as an abrupt and sustained (more than 24 hours) decrease in kidney function and categorized according to RIFLE criteria k Was considered in patients with con rmation of SARS-CoV-2 infection showing recurrence of fever, increase in cough and production of purulent sputum plus positive bacterial/fungal respiratory or blood cultures at ICU admission l Kruskal-Wallis, ANOVA, or chi-square p-value as appropriate comparing survivors vs. non-survivors.