Skip to main content

Deploying unsupervised clustering analysis to derive clinical phenotypes and risk factors associated with mortality risk in 2022 critically ill patients with COVID-19 in Spain



The identification of factors associated with Intensive Care Unit (ICU) mortality and derived clinical phenotypes in COVID-19 patients could help for a more tailored approach to clinical decision-making that improves prognostic outcomes.


Prospective, multicenter, observational study of critically ill patients with confirmed COVID-19 disease and acute respiratory failure admitted from 63 ICUs in Spain. The objective was to utilize an unsupervised clustering analysis to derive clinical COVID-19 phenotypes and to analyze patient’s factors associated with mortality risk. Patient features including demographics and clinical data at ICU admission were analyzed. Generalized linear models were used to determine ICU morality risk factors. The prognostic models were validated and their performance was measured using accuracy test, sensitivity, specificity and ROC curves.


The database included a total of 2022 patients (mean age 64 [IQR 5–71] years, 1423 (70.4%) male, median APACHE II score (13 [IQR 10–17]) and SOFA score (5 [IQR 3–7]) points. The ICU mortality rate was 32.6%. Of the 3 derived phenotypes, the A (mild) phenotype (537; 26.7%) included older age (< 65 years), fewer abnormal laboratory values and less development of complications, B (moderate) phenotype (623, 30.8%) had similar characteristics of A phenotype but were more likely to present shock. The C (severe) phenotype was the most common (857; 42.5%) and was characterized by the interplay of older age (> 65 years), high severity of illness and a higher likelihood of development shock. Crude ICU mortality was 20.3%, 25% and 45.4% for A, B and C phenotype respectively. The ICU mortality risk factors and model performance differed between whole population and phenotype classifications.


The presented machine learning model identified three clinical phenotypes that significantly correlated with host-response patterns and ICU mortality. Different risk factors across the whole population and clinical phenotypes were observed which may limit the application of a “one-size-fits-all” model in practice.


Since the outbreak of COVID-19 disease began in December 2019 in China, soaring cases of confirmed SARS-CoV-2 are pummeling the global health system. More than 91 million people have developed SARS-CoV-2 infection, and more than 2 million have died [1]. Critical illness from COVID-19 has constrained intensive care unit (ICU) material and human resources [2]. As of January 18, 2021 more than 2.5 million people in Spain have been infected with SARS-CoV-2 and more than 53,000 have died [3]. Short-term mortality reported rate ranges from 16 to 62% of patients admitted to ICU [4,5,6,7,8]. The heterogeneity of patients that have been treated in China [4], Italy [8], USA [5,6,7] or Spain [9, 10] may explain the wide variation of mortality rate due to their population characteristics, presence of comorbidities and healthcare systems. A recent international survey [11] reported significant practice variations in the management of severe COVID-19 patients, including differences at the regional, hospital, and patient level. Therefore, it is necessary to characterize phenotypes, by extending the enrolment of patients outside of one ICU site to multiple patients being treated in different hospitals. Allowing to adequately measure mortality-related factors adjusted by the inter-hospital variation to determine clinical outcomes.

Risk factors represent the most important approach when defining treatment of hospitalized patients as these measures can inform clinical courses most likely for a patient given their a priori risk. However, risk factors can also interplay differently when they are included in different patient clusters. A single model based on general risk factors (one-size-fits-all) might be limited for clinical data interpretation and application across sites. Different combinations of risk factors may naturally cluster into previously undescribed subsets or phenotypes that may have different risks for a high mortality rate and that may therefore help to determine the response to treatments in COVID-19. We hypothesize that the presence of well-defined phenotypes in COVID-19 could help to more appropriately identify patients at risk of ICU mortality than general models for the entire population considering that this disease results in a constellation of symptoms, laboratory derangement, immune dysregulation, and clinical complications.

The primary objective was to determine the presence of distinct clinical phenotypes using unsupervised clustering methods that were applied to the datasets available on ICU admission. The second objective was to assess which factors are independently associated with ICU mortality. The added value of this large-scale multicenter prospective study lies to discover phenotypes based on clinical data available at ICU admission that can help explain the variation in clinical results of COVID-19 disease in the ICU.

Material and methods

Study design

A multicenter observational, prospective cohort study that consisted of a large-scale data source of hospital ICU admissions and patient-level clinical data. The enrolment criteria included adult’ patients with laboratory confirmed SARS-CoV-2 infection admitted in 63 ICUs across Spain due to acute respiratory failure between February 22, 2020 and May 11, 2020. The study was approved by the reference institutional review board at Joan XXIII University Hospital (IRB# CEIM/066/2020) and each participating site with a waiver of informed consent. All data values were anonymized prior to the phenotyping which consisted of clustering clinical variables on their association with COVID-19 mortality.

Study sites and patients population

The study enrolled consecutive adult patients (> 16 years) with laboratory confirmed SARS-CoV-2 infection, detected by RT-PCR positive test of nasopharyngeal, oropharyngeal swab or invasive respiratory samples according to the WHO recommendations [12]. The follow-up of patients was scheduled until August 11, 2020, which confirmed ICU discharge or death whichever occurred first. A complete list of participating ICUs and their investigators is provided in the acknowledgements section. In this cohort, 43 patients were described in a preliminary report of a single–center case series in Tarragona, Spain [9].


The primary outcome included all-causes of ICU mortality. Patients who were discharged alive from ICU were evaluated in the data as alive considering mortality was defined as any in-ICU death. All complications and outcomes were followed during ICU admission.

Data collection

Data was obtained from a voluntary registry created by Spanish Society of Intensive Care Medicine-SEMICYUC. All consecutive cases admitted to the ICU were collected. There were no patients excluded from the analysis that was enrolled to participating ICU and met criteria.

All the collected variables recorded at ICU admission are listed in the Additional file 1: p. 6. To determine severity of illness, the Acute Physiology and Chronic Health Evaluation (APACHE) II score [13] and Sequential Organ Failure Assessment (SOFA) scoring [14] were calculated for all patients within the first 24 h of ICU admission.

The ICU admission criteria, use of antiviral, antibiotic or co-adjuvant treatment, and also the measures that would determine the need to intubate and type of ventilator support required (oxygenation, high flow nasal cannula [HFNC], noninvasive [NIV] or invasive [IMV] mechanical ventilation) were not standardized between centers and were left to the discretion of the attending physician, according to SEMICYUC and National Ministry of Health [15] and were included in the case report form and confirmed by the medical records. We also collected hospital-level data including city, county and number of hospital beds available. The study definitions used in the present analysis are shown in the Additional file 1: p. 2.

Statistical analysis

No statistical sample size calculation was performed a priori, and sample size was equal to the number of patients admitted to the participant’s ICUs with confirmed COVID-19 during the study period. To describe baseline characteristics, the continuous variables were expressed as median (interquartile range [IQR]) and categorical variables as number of cases (percentage). For patient demographics and clinical characteristics, differences between groups were assessed using the chi-squared test and Fisher’s exact test for categorical variables, and the Mann–Whitney U or Wilcoxon test for continuous variables. To performed the analysis, we first assessed the candidate variables, missing values, and correlation. Multiple imputation was used to account for missing data (Additional file 1: p. 2). After evaluating correlation, highly correlated variables were excluded (Additional file 1: p. 5).

An overview of the primary analysis plan is outlined in Fig. 1. In a first step, a multilevel conditional logistic modelling and the intraclass correlation coefficient (ICC) was calculated (Additional file 1: p. 2) with patients nested in hospital to characterize hospital-level variation of ICU mortality and determine if a significant inter-hospital variation is present.

Fig. 1

Overview of the primary analysis plan. ICU Intensive care units, PAM partition around medoids clustering analysis, GLM Generalized Linear model

In a second step, to determine presence of distinct clinical phenotypes in our population of COVID-19 patients, an unsupervised clustering analysis was applied to the database at ICU admission. In order to carry out this analysis, a discretization of the numerical variables into categorical ones was done using “ChiMerger” packages for R software. The information provided by each variable regarding ICU mortality was defined using the Information Value (IV). A IV greater than 0.03 was considered clinically important and this variable was included in the multivariate logistic regression analysis. Model performance was examined using accuracy test, Sensitivity, Specificity and AUC modeling. Subsequently, the unsupervised cluster analysis was performed using the important variables. The Podani distance was used to calculate the distance between patients and the “partition around medoids” (PAM) algorithm to perform the clustering [16]. The optimal number of clusters were determined after studying the silhouette [17] and the PAM objective for different numbers of clusters (Additional file 1: p. 12). Each of these clusters represent a specific patient’s phenotype. To visualize the clusters in a lower dimensional space, we used a Principal Component Analysis (PCA). We obtain important variables according to IV for each phenotype, and the OR of these variables were obtained after applying a GLM (Generalize linear Regression model) analysis. GLM is how statistical software R performs multiple logistic regression analysis when the command family = “binomial” is indicated. Multinomial regression models were fit to further compare patient comorbidities across phenotype classification. Model performance in each phenotype was examined using accuracy test, Sensibility, Specificity and AUC.

Lastly, a traditional multivariate analysis GLM was performed to investigate the association between baseline (on ICU admission) variables and ICU-mortality. The GLM model comprised factors of clinical interest and all significant covariates (p < 0.05) in the univariate analysis of ICU mortality and presence of collinearity was studied by variance inflation factors (VIF). The results are presented as odds ratios (OR) and 95% confidence intervals (CI). To determine our model, we checked adequate model performance between groups with a cross validation model (K-fold = 10) and the model with better performance was chosen.

For all model validation, database was randomly split into two subsets: (a) a “training set” (80%), and (b) a “validation set” (20%). Model performance was examined using accuracy test, precision, sensitivity, specificity and area under ROC curve (AUC). Data analysis was performed using R software (


Patients characteristics at ICU admission

From February 29, 2020 to June 11, 2020 a total of 2,022 critically ill patients from 63 ICUs were enrolled in the present analysis. Forty percent of ICUs belonged to hospitals with more than 500 beds, 40% to hospitals between 200 and 500 beds and the remaining 20 percent to hospitals with fewer than 200 beds. To determine if a significant inter-hospital variation is present, multilevel conditional logistic modelling with patients nested in hospital to characterize hospital-level variation of ICU mortality was done. According to intraclass correlation coefficient (ICC) obtained 0.04 when considering all hospital (n = 63) and of 0.04 when excluded hospitals that submitted data on few than 10 patients, no significant inter-hospital variation was observed (Additional file 1: e-Fig. 2, p. 7).

Fig. 2

Phenotype clinical characterization (APACHE II Acute Physiology and Chronic Health Evaluation II, SOFA Sequential Organ Failure Assessment, LDH D-Lactate dehydrogenase, U/L, AKI Acute Kidney injury)

The median (IRQ) age was 64 (55–71) years, and 1,423 (70.3%) were men. The median of time between the onset of symptoms and diagnosis was 7 (4–9) days. A total of 1,467 (72.6%) patients had at least one coexisting comorbidity. Arterial hypertension (936 [46.7%]), obesity (655 [32.4%]) and diabetes mellitus (420 [20.7%]) were the most frequently comorbid conditions reported. The severity of illness was high according to the APACHE II (14; IQR 11–18) and SOFA (5.7; IQR 4–7.3) scores. The PaO2/FiO2 ratio on the day of ICU admission was 132(IQR 96–163) and 1131 (55·9%) patients meet criteria of severe and 198(9·8%) of mild acute distress respiratory syndrome (ADRS) profile at ICU admission, 1174 (58.0%) patients required invasive mechanical ventilation (MV), 906 (44.8%) developed shock and 580 (28.7%) meet criteria of acute kidney injury (AKI). The overall ICU length of stay (LOS) was 14 (8–26) days and similar for survivors (14 [8–27]) days and non-survivors (14 [8–24]) days (p = 0.10).

The most frequent prescribed co-adjuvant treatments for COVID-19-related infection were hydroxychloroquine (1866 [92.3%]) and lopinavir/ritonavir (1662 [82.2%]). Empiric antibiotic treatment was administered in 1818 (89.9%) of the patients and intravenous corticosteroids in 1174 (58.6%). Further clinical characteristics of patients and laboratory finding are shown in Additional file 1: e-Table 3, p. 8.

ICU mortality

Overall, 660 patients (32.6%) died. The crude ICU mortality increased significantly with the increase in predefined age cut-off and was greater than 80% in patients over 80 years old (Additional file 1: p. 7). Age, male sex, severity of illness (APACHE II and SOFA), presence of arterial hypertension, diabetes, coronary arterial disease, chronic obstructive pulmonary disease (COPD), chronic kidney disease (CKD), immunosuppression and hematologic disease markers were significantly higher in ‘non-survivor’ patients. Non- survivor patients compared with those that survived had higher levels of D-Lactate dehydrogenase (LDH), white blood cells, serum creatinine, C-reactive protein (CRP), Procalcitonin (PCT), serum lactate, serum D-dimer and serum ferritin. Non-survivor patients developed more frequent complications such as shock, kidney and myocardial dysfunction at ICU admission. High Flow Nasal Cannula (HFNC) was more frequent in survivors, while invasive mechanical ventilation (MV) was more common in non-survivors. Mortality for those who received MV during their ICU stay (n = 1554; 76.8%) was 37.3% (n = 580) higher than observed in patients who did not require MV 17.0% (80/468, p < 0.001). Complete characteristics of patients according outcome are shown in Table 1.

Table 1 Characteristics of 2022 patients according ICU outcome

Unsupervised analysis (cluster) to determine different phenotypes in critically ill patients

Once the variables were categorized, 5 patients (0.24%) were excluded for outlier’s data, and the analysis was performed with 2,017 patients. Of the 50 variables considered, only 25 were considered as predictors according to the IV (Additional file 1: p. 11) and were included in the model. Remarkably, no treatment option was a predictive factor for ICU-mortality. The categorized variables independently associated with ICU-mortality are shown in Additional file 1: p. 12. The performance of the model was adequate with an accuracy of 0.77, sensitivity of 0.88, specificity of 0.54 and AUC of 0.82. According to the Podani’s distance and the Shilouette and PAM plots (Additional file 1: p. 13) the optimal number of clusters in our dataset was 3. Cluster A included 537 patients (26.7%), cluster B included 623 (30.8%) and cluster C included 857 patients (42.5%). The clusters in a lower dimensional space are shown in the Additional file 1: p. 14. The size and characteristics of the phenotypes in the 3-class model are shown in Table 2. Patients with the cluster A phenotype (mild COVID-19 disease) had < 65 years, lower severity of illness, fewer abnormal laboratory values and less development of complications, with a crude ICU mortality of 20.3%; those with the cluster B phenotype (moderate COVID-19 disease) had similar characteristics as seen in the A phenotype but were more likely to present shock at ICU admission with a crude ICU-mortality of 25.5%. Patients with the cluster C phenotype (severe COVID-19 disease) had > 65 years, a high level of severity of illness, more likely to have elevated measures of inflammation (e.g. D dimer, LDH and ferritin), high frequency of shock, AKI and myocardial dysfunction, with a crude ICU mortality of 45.4%. The clinical characterization of each observed phenotype can be seen in Fig. 2. By including these important variables in a regression model for each cluster, we observed that the discrimination of each model was higher than general model except for C phenotype (Table 3). The Variables independently associated with mortality were different between automatic and cluster models (Table 4 and Fig. 3 A-B).

Table 2 Characteristics of 2017 critically ill patients included in machine learning analysis according to overall or cluster (phenotype) population
Table 3 Performance of global and clustering models (GLM model: generalized linear models with variables according ICU mortality table; Global model: GLM model with important variables according to information value analysis for ICU mortality; A, B and C Phenotypes: GLM models with important variables according to information value analysis for each cluster)
Table 4 Factors independently associated with ICU mortality in automatic and clustering models (automatic model: generalized linear model [GLM] with important variables according to information value analysis for ICU mortality; A, B and C Phenotypes: GLM models with important variables according to information value analysis for each cluster)
Fig. 3

a Chord diagrams showing abnormal clinical variables by phenotype. A: mild COVID-19 disease; B: moderate COVID-19 disease and C: severe COVID-19 disease. b Chord diagrams showing abnormal clinical variables by Phenotype differentiating survivors (green) from non-survivors (red) (APACHE II Acute Physiology and Chronic Health Evaluation II, SOFA Sequential Organ Failure Assessment, PCT Procalcitonin, > 3 chest X-ray more than 3 quadrants infiltrates in the chest X-ray, Miocard Dys Myocardial dysfunction, Hydroxichloroq. Hydroxychloroquine, GAP antiviral Time in days from onset of symptoms to first dose of antiviral, DD D dimer, AKI Acute Kidney injury, LDH D-Lactate dehydrogenase, U/L, COPD Chronic Pulmonary Obstructive Disease, Pa/Fi Partial pressure arterial oxygen/fraction of inspired oxygen, Hemat. Dis Hematologic disease, GAP_UCI Time in days from Hospital to ICU admission, Coronary dis. Coronary disease)

Construction of the ICU Mortality classic multivariate model

Of the 42 variables measured at ICU admission, 25 variables that were statistically significant in the univariate analysis (Table 1) were included in the model. The initial dataset of patients was randomly split in two subgroups “Training group” with 1,618 patients (80%) and “Test group” with 404 patients (20%). The characteristics of patients included into each subgroup are shown in Additional file 1: e-Table 3, p. 8. No significant differences were observed between the subgroups. Inclusion of these 25 variables in a GLM model for the training group, resulted in 10 variables that were independently associated with ICU mortality (Fig. 4). No presence of collinearity between explanatory variables was observed (Additional file 1: p. 11) and the Hosmer–Lemeshow Goodness-of-fit test (X-squared = 5.53, df = 8, p-value = 0.69) established no discrepancy between the observed values and those that would have been expected in the model. The validation of the classic model in the test group demonstrated adequate performance with an accuracy of 0.78, a precision test of 0.73, sensitivity of 0.88, specificity of 0.45 and an AUC ROC of 0.82 (95%CI 0.78–0.86) (Additional file 1: p. 10). Performance of classic model was similar than automatic model (Table 3), however, the variables included in each model were different (Fig. 4 and Additional file 1: e-Fig. 9, p. 16).

Fig. 4

Variables independently associated with ICU mortality in multivariable analysis (GLM: generalized linear model). Data are show as OR (odds ratio) and 95% Confidence interval (SOFA Sequential organ failure assessment, PCT Procalcitonin, PaO2/FiO2 Partial pressure arterial oxygen/fraction of inspired oxygen, Dysf Dysfunction, LDH D-Lactate dehydrogenase, MV Mechanical ventilation, AKI Acute Kidney injury, > 2 infiltrates  > 2 infiltrates in chest-X ray)


The main finding of our study is that among patients with COVID-19, 3 clinical phenotypes were derived using habitual clinical and laboratory variables at ICU admission. The ability of identifying phenotypes using a small set of variables is a crucial step towards clinical application and has important implications for possible differential treatment guided by phenotypes and validated prognostic scoring systems [18, 19].

Our C phenotype was associated with more than double the ICU mortality than each of the remaining two phenotypes. This C phenotype was characterized by the interplay of older age (> 65 year), a high severity (APACHE II > 15 and SOFA > 5), greater burden of risk factors (hematologic disease and coronary disease) and a higher likelihood of developing further complications (shock and AKI).

Previous studies have implemented clustering techniques to analyze various data sources relating to demographic, geographic, environment, and socioeconomic determinants of health and disease. There are studies that have evaluated treatment decisions and characterized clinical phenotypes associated with complications, ICU admission and mortality risk in critically ill COVID-19 patients. According to the Situation Report & Public Health Guidance published by Johns Hopkins University on March 19th, 2020, people over 60 and those with chronic health conditions are at the highest risk for COVID-19 complications [20]. To our knowledge, this is the first study with a high number of critically ill patients to analyze the presence of phenotypes in patients with SARS-CoV-2 infection. This multicenter cohort study of 2,022 critically ill patients found that 660 patients (32.6%) died at ICU discharge. Our ICU mortality rates was significantly lower as reported in Yang et al. [4] in Wuhan, China (61.5%), by Myers et al. [7] in California, USA (50.0%), by Arentz et al. [6] in Washington, USA (67%) and by Richardson et al. [5] in New York, USA (78%), but slightly higher to reported by Grasselli et al. [8] in Lombardy region, Italy (26%). These observed differences in ICU mortality could respond to different healthcare models and important practice variations in the management of severe COVID-19 patients [11], but it can also depend on the frequency of presentation of the different phenotypes.

In our study, a great variability in model performance and risk factors were observed during cross-validation to choose the best model to use. In addition, we use 2 different techniques for the selection of important variables, one of them is the “classic” approach dependent on the p-value, while the other, a “modern” statistical approaches, is more in line with the new recommendations [21]. Although the performance of the models was similar, the variables included in each of them are different. This could be related to the presence of a very heterogeneous patient population, which is revealed during random partitioning (80%/20%) validation of each model or by implementing 2 variable selection techniques. In this context, three clinical phenotypes of COVID-19 patients were derived using routinely available clinical data at ICU admission by an unsupervised cluster analysis. The phenotypes were multidimensional, differed in their demographics, laboratory abnormalities, patterns of organ dysfunction, and associated with ICU mortality. In addition, our phenotypes are not similar with groupings or phenotypes of patients performed so far considering only the presence of clinical complications [22, 23], or the type of ARDS [24]. Our COVID-19 phenotypes can be identified at the time of the ICU admission, and thus could be useful in facilitating early tailored therapy and improve prognosis.

Only routinely available clinical and laboratory data were used in the clustering models, and the phenotypes were derived from a large observational multicenter cohort to ensure generalizability. Importantly, we have observed that the variables associated with the ICU mortality varied between the global model and the models developed for each phenotype. The discrimination power (AUC) of A and B phenotypes models improved in comparison to the global model. However, for the C phenotype (severe COVID-19 disease), the performance of the model was not superior respect of the global model. The C phenotype was most strongly correlated with abnormal values of biomarkers as well as clinical features of cardiovascular dysfunction, AKI and subsequently a higher ICU mortality. Although the AUC for C phenotype is lower, the relationship between sensitivity and specificity in C phenotype model might be more appropriate. Specificity can sometimes be more important than sensitivity, because confirming that a person does not have the event under study (survival) is more important than detecting if a person has it.

Recently, several authors have proposed different clinical phenotypes of COVID-19 patients [22,23,24]. Rello et al. [22] speculated that COVID-19 has five phenotypic presentations based on physiological and clinical features from published studies. Garcia-Vidal et al. [23] describe the main clinical complications of hospitalized patients with COVID-19 through classification into three pattern groups (inflammatory, co-infection and thrombotic). However, as the authors acknowledge, the cut-off points of the different biomarkers for defining phenotypes have been arbitrary and not scientifically supported. Finally, Gattinoni et al. [24] proposing two phenotypes for COVID-19 patients, (1) “Type L” characterized by high compliance and low lung recruitablity and (2) “Type H” with low compliance and high lung recruitability as a two “extremes” of a spectrum of respiratory failure in COVID-19 pneumonia. Despite the importance related to clinical experience in each of these approaches, none of these studies have been developed through a machine learning process to determine phenotypes nor have they been tested for validation.

Hypoxemia has been proposed as a marker of severity for the differentiation of phenotypes [22, 24]. In our study, the PaO2/FiO2 relationship at ICU admission was an independent risk factor for ICU mortality in overall multivariate analysis (as a continuous or dichotomized variable), but was only closely associated with ICU mortality in phenotype C. Other variables such as advanced age, serum D-dimer values and the development of AKI were variables more strongly related to ICU mortality in all subgroups or phenotypes analyzed than PaO2/FiO2 at ICU admission.

Our results should be interpreted in the context of the study limitations. First, although phenotypes were found to be generalizable in our population, risk factors and characteristics of clinical phenotypes were derived initially from data at ICU admission of multicenter observational study in Spain. However, these risk factors are similar to those that have been reported by other investigators [4,5,6,7,8]. The cross-validation carried out and the high discrimination observed for each of the models built for phenotypes, suggests their applicability to other populations, but it should be examined considering the high variability observed in patients with COVID-19 and in the support measures applied. Second, because missing data were common for some variables included in the clustering models, multiple imputation was used in the primary analysis. However, variables with high missing values were excluded and the missing threshold used was reported elsewhere [18]. Third, only routinely available clinical data at ICU admission were used to identify risk factors and clinical phenotypes, and the inclusion of other data related to clinical evolution of patients in the ICU could change risk factors or phenotype assignments. However, our objective was to study early risk factors and phenotypes at ICU admission that may allow for early treatment implementation and as a result improve patient outcome. Fourth, although IL-6 is an excellent severity biomarker, we have not been able to include this biomarker in the models because more than 50% of patients had no IL-6 determination upon ICU admission. Although the inclusion of IL-6 in models could modify or improve their performance, we do not consider it appropriate to impute a large number of missing data. In addition, if IL-6 is a biomarker not usually available its inclusion in the models would not have practical application. Finally, we did not collect data on ethnicity or socioeconomic factors. These factors may play a role in the prevalence of pre-existing comorbidities and mortality due to COVID-19. Our findings should be interpreted within the context of the study population and its generalizability to other populations warrant further investigation.


To our knowledge this is the largest study that describe different phenotypes of patients with confirmed COVID-19 that were admitted to ICU to date. We not only characterized three novel clinical phenotypes, but extended findings outside of a single site ICU by characterizing the association of comorbidities with clinical phenotype and the association of clinical phenotypes with clinical outcomes. Different risk factors for the global population and clinical phenotypes were observed, possibly due to the heterogeneity of patients, which may limit the application of a single predictive model for all patients with COVID-19. Further research is needed to determine the application of these phenotypes in clinical practice, in other patient’s population and for clinical trial design.

Availability of supporting data

The anonymized database collected for the study by the SEMICYUC, and the data dictionary that defines each field in the set, will be made available to reviewers if they consider it necessary prior confidentiality agreement.



Acute distress respiratory syndrome


Acute kidney injury


Acute Physiology and Chronic Health Evaluation II


Area Under the ROC Curve


Chronic kidney disease


Confidence interval


Chronic obstructive pulmonary disease


C-Reactive Protein


Generalized linear regression model


High flow nasal cannula oxygen therapy


Interclass correlation coefficient


Intensive Care Unit


Interquartile range


Information value


D-lactate dehydrogenase


Length of stay


Mechanical ventilation


Partition around medoids


Principal component analysis




Receiver Operator Characteristic curve


Real time polymerase chain reaction


Sociedad Española de Medicina Intensiva, Critica y Unidades Coronarias


Sequential Organ Failure Assessment


Variance inflation factors


World Health Organization


  1. 1. Accessed 28 November 2020

  2. 2.

    Ferrer R. Pandemia por Covid-19: el mayor reto de la historia del intensivismo. Med Intensiva. 2020.

  3. 3.

    Actualización nº 291. Enfermedad por el coronavirus (COVID- 19). Ministerio de Sanidad de España.

  4. 4.

    Yang X, Yu Y, Xu J, Shu H, Xia J, Liu H, et al. Clinical course and outcomes of critically ill patients with SARS-CoV-2 pneumonia in Wuhan, China: a single-centered, retrospective, observational study. Lancet Respir Med. 2020;8:475–81.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  5. 5.

    Richardson S, Hirsch JS, Narasimhan M, Crawford JM, McGinn T, Davidson KW, the Northwell COVID-19 Research Consortium, et al. Presenting characteristics, comorbidities, and outcomes among 5700 patients hospitalized with COVID-19 in the New York City area. JAMA. 2020;323(20):2052–9.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  6. 6.

    Arentz M, Yim E, Klaff L, Lokhandwala S, Riedo FX, Chong M, Lee M. Characteristics and outcomes of 21 critically Ill patients with COVID-19 in Washington State. JAMA. 2020;323(16):1612–4.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  7. 7.

    Myers LC, Parodi SM, Escobar GJ, Liu VX. Characteristics of hospitalized adults with COVID-19 in an integrated health care system in California. JAMA. 2020;323(21):2195–8.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  8. 8.

    Grasselli G, Zangrillo A, Zanella A, Antonelli A, Cabrini L, Castelli A, et al. Baseline characteristics and outcomes of 1591 patients infected with SARS-CoV-2 admitted to ICUs of the Lombardy Region, Italy. JAMA. 2020;323(16):1574–81.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  9. 9.

    Rodríguez A, Moreno G, Gómez J, Carbonell R, Picó-Plana E, Benavent Bofill C, et al. Infección grave por coronavirus SARS-CoV-2: experiencia en un hospital de tercer nivel con pacientes afectados por COVID-19 durante la pandemia 2020. Med Intensiva. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  10. 10.

    Serrano-Martínez JL, Machado-Casas JF, Redondo-Orts M, Manzano-Manzano F, Castaño-Pérez J, Pérez-Villares JM. Características y resultados de una serie de 59 pacientes con neumonía grave por COVID-19 ingresados en UCI. Med Intensiva. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  11. 11.

    Azoulay E, de Waele J, Ferrer R, Staudinger T, Borkowska M, Povoa P, et al. International variation in the management of severe COVID-19 patients. Crit Care. 2020;24:486.

    Article  PubMed  PubMed Central  Google Scholar 

  12. 12.

    World Health Organization (WHO). Laboratory testing for 2019 novel coronavirus (2019-nCoV) in suspected human cases [accessed Apr 20, 2020] Disponible en:

  13. 13.

    Knaus WA, Draper EA, Wagner DP, Zimmerman JE. APACHE II: a severity of disease classification system. Crit Care Med. 1985;13(10):818–29.

    CAS  Article  Google Scholar 

  14. 14.

    Vincent JL, Moreno R, Takala J, Willatts S, De Mendonça A, Bruining H, et al. The SOFA (Sepsis-related Organ Failure Assessment) score to describe organ dysfunction/failure. On behalf of the Working Group on Sepsis-Related Problems of the European Society of Intensive Care Medicine. Intensive Care Med. 1996;22(7):707–10.

    CAS  Article  Google Scholar 

  15. 15.

    Ministerio de Sanidad. Documento técnico. Manejo clínico del COVID-19. Unidades de Cuidados Intensivos.

  16. 16.

    Van der Laan M, Pollard K, Bryan J. A new partitioning around medoids algorithm. J Stat Comput Simul. 2003;73(8):575–84.

    Article  Google Scholar 

  17. 17.

    Rousseeuw PJ. Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. J Comput Appl Math. 1987;20:53–65.

    Article  Google Scholar 

  18. 18.

    Seymour CW, Kennedy JN, Wang S, Chang CCH, Elliott CF, Xu Z, et al. Derivation, validation, and potential treatment implication of novel clinical phenotypes for sepsis. JAMA. 2019;321(20):2003–17.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  19. 19.

    Sinha P, Delucchi KL, McAuley DF, O’Kane CM, Matthay MA, Calfee CS. Development and validation of parsimonious algorithms to classify acute respiratory distress syndrome phenotypes: a secondary analysis of randomised controlled trials. Lancet Respir Med. 2020;8(3):247–57.

    CAS  Article  PubMed  Google Scholar 

  20. 20.

    - Sharfstein J. COVID-19 situation report and public health guidance. Retrieved from Johns Hopkins Bloomberg School of Public Health; 2020.

  21. 21.

    Leisman DE, Harhay MO, Lederer DJ, et al. Development and reporting of prediction models: guidance for authors from editors of respiratory, sleep, and critical care journals. Crit Care Med. 2020;48(5):623–33.

    Article  Google Scholar 

  22. 22.

    Rello J, Storti E, Belliato M, Serrano R. Clinical phenotypes of SARS-CoV-2: implications for clinicians and researchers. Eur Respir J. 2020;55(5):2001028.

    CAS  Article  PubMed  PubMed Central  Google Scholar 

  23. 23.

    Garcia-Vidal C, Moreno-García E, Hernández-Meneses M, Puerta-Alcalde P, Chumbita M, Garcia-Pouton N, et al. Personalized therapy approach for hospitalized patients with COVID-19. Clin Infect Dis. 2020.

    Article  PubMed  PubMed Central  Google Scholar 

  24. 24.

    Gattinoni L, Chiumello D, Caironi P, Busana M, Romitti F, Brazzi L, Camporota L. COVID-19 pneumonia: different respiratory treatments for different phenotypes? Intensive Care Med. 2020;46:1099–102.

    CAS  Article  PubMed  Google Scholar 

Download references


To Alexis Garduno for English manuscript edition and to the COVID-19 SEMICYUC Investigators:


UCI Hospital Universitario Virgen de Valme (Sevilla): Ana Loza; UCI Hospital Quirón (Huelva): Diego Matallana Zapata; UCI Hospital Universitario Puerto Real (Cádiz): Isabel Díaz Torres, Sonia Ibañez Cuadros, María Recuerda Nuñez, Maria Luz Carmona Pérez, Jorge Gómez Ramos, Alba Villares Casas; UCI Hospital Universitario Virgen de la Macarena (Sevilla): María Luisa Cantón, José Javier González Contreras, Helena Pérez Chomón, Nerissa Alvarez Chicote, Alberto Sousa González; UCI Hospital Universitario Reina Sofía (Córdoba): María De Alba Aparicio, Juan Carlos Pozo Laderas; UCI Hospital Universitario de Jerez (Jerez de la Frontera): Angel Estella, Sara moreno Cano. UCI Hospital Infanta Elena (Huelva): Diego Matallana Zapata.


UCI Hospital Nuestra Señora de Gracia (Zaragoza): Ruth Jorge García; UCI Hospital Clínico Universitario Lozano Blesa (Zaragoza): Laura Sánchez Montori, Sandra Herrero García, Paula Abanses Moreno, Carlos Mayordomo García. UCI Hospital General San Jorge(Huesca): Tomás Mallor Bonet, Paula Omedas Bonafonte, Enric Franquesa Gonzalez, Nestor Bueno Vidales, Paula Ocabo Buil, Carlos Serón Arbeloa; UCI Hospital Universitario Miguel Servet(Zaragoza): Isabel Sancho, Pablo Guerrero Ibañez, Pablo Gutierrez, UCI Hospital Obispo Polanco (Teruel): María Concepción Valdovinos, Raquel Canto.UCI Hospital Nuestra Señora de Gracia (Zaragoza): Ruth Jorge García; UCI Hospital Clínico Universitario Lozano Blesa (Zaragoza): Laura Sánchez Montori, Sandra Herrero García, Paula Abanses Moreno, Carlos Mayordomo García. UCI Hospital General San Jorge(Huesca): Tomás Mallor Bonet, Paula Omedas Bonafonte, Enric Franquesa Gonzalez, Nestor Bueno Vidales, Paula Ocabo Buil, Carlos Serón Arbeloa; UCI Hospital Universitario Miguel Servet(Zaragoza): Isabel Sancho, Pablo Guerrero Ibañez, Pablo Gutierrez, UCI Hospital Obispo Polanco (Teruel): María Concepción Valdovinos, Raquel Canto.


UCI Hospital Universitario San Agustín (Avilés): Ana Luz Balán Mariño, María José Gutiérrez Fernández, Marta Martín Cuadrado, Belén García Arias; UCI Hospital Universitario Central de Asturias (Oviedo): Lorena Forcelledo Espina, Lucía Viña Soria, Lorena Martín Iglesias, Lucía López Amor, Elisabet Fernández Rey, Emilio García Prieto. UCI Hospital Cabueñes (Gijón): Débora Fernández Ruíz, Carla Martínez González.


UCI hospital Universitario Son Llatzer (Palma de Mallorca): Lorenzo Socias, Marcio Borges-Sá, María Aranda Pérez, Antonia Socias. UCI Hospital Quirón Salud Palmaplanas (Palma de Mallorca): José Mª Bonell Goytisolo, Inmaculada Alcalde Mayayo, Carlos Corradini, Isabel Ceniceros, Edwin Rodríguez; UCI Hospital Universitario Son Espases (Palma de Mallorca): Jose Ignacio Ayestarán Rota, Mariana Andrea Novo Novo, Joaquim Colomina Climent, Albert Figueras Castilla, Tomàs Leal Rullan, Maria Magdalena Garcias Sastre; UCI Hospital Comarcal d’Inca(Inca): Rossana Pérez Senoff; UCI Hospital Mateu Orfila (Mao): Ramón Fernández-Cid Bouza.


UCI Complejo Hospitalario Universitario Insular—Materno Infantil (Las Palmas de G.C): Juan Carlos Martín González, Carmen Pérez Ortiz, José Luciano Cabrera Santana, Juan José Cáceres Agra, Domingo González Romero, Ana Casamitjana Ortega; UCI Hospital General de la Palma (Tenerife): Luis Alberto Ramos Gómez, Carolina Montelongo Ojeda; UCI Hospital Universitario Dr. Negrín (Las Palmas de G.C): Jordi Solé-Violán.


UCI Hospital Universitari de Tarragona Joan XXIII (Tarragona): Alejandro Rodríguez, María Bodí, Gerard Moreno, Sandra Trefler, Laura Claverias, Raquel Carbonell, Erika Esteve, Montserrat Olona, Xavier Teixidó. UCI Hospital Universitari Arnau de Vilanova (Lleida): Monserrat Vallverdú Vidal, Begoña Balsera Garrido. UCI Hospital Universitari Vall d’Hebron (Barcelona): Elisabeth Papiol Gallofré, Raquel Albertos Martell, Rosa Alcaráz Peñarrocha, Xavier Nuvials Casals, Ricard Ferrer Roca; UCI Hospital Verge de la Cinta (Tortosa): Eric Adrián Mayor Vázquez, Ferrán Roche Campo, Pablo Concha Martínez, Diego Franch Llasat;UCI Hospital del Mar (Barcelona): Joan Ramón Masclanz, Judith Marín-Corral, Purificación Pérez, Rosana Muñoz, Clara Vila; UCI Hospital Mutua de Terrasa (Terrasa): Francisco Javier González de Molina, Elisabeth Navas Moya, Josep Trenado; UCI Hospital Sant Joan (Reus): Imma Vallverdú, Eric Castañé; UCI Hospital Parc Tauli (Sabadell): Emili Díaz Santos, Gemma Goma, Edgar Moglia.


UCI Hospital Universitario Marqués de Valdecillas(Santander): Borja Suberviola.

Castilla La Mancha:

UCI Hospital Universitario de Guadalajara (Guadalajara): Antonio Albaya Moreno, Carlos Marian Crespo. UCI Hospital Nuestra Señora del Prado (Toledo): Carmen Carolina Sena Pérez, Francisca Arbol Linde.UCI Hospital Universitario de Guadalajara (Guadalajara): Antonio Albaya Moreno, Carlos Marian Crespo. UCI Hospital Nuestra Señora del Prado (Toledo): Carmen Carolina Sena Pérez, Francisca Arbol Linde.

Castilla y León:

UCI Hospital Virgen de la Concha (Zamora): Diana Monge Donaire, Vega Losada Martínez, Nuria Rodrigo Castroviejo, Gerardo Ferrigno, Reyes Beltrán, Carolina Sanmartino, Concepción Tarancón Maján, Alfredo Marcos Gutiérrez; UCI Complejo Asistencial de Segovia(Segovia): Virginia Hidalgo Valverde, Caridad Martín López; UCI Hospital universitario de Burgos (Burgos): Oihane Badallo, María del Valle Ortiz, Rebeca Vara Arlanzón, David Iglesias Posadilla; UCI Hospital Clínica de Salamanca (Salamanca): María Teresa Recio, Juan Carlos Ballesteros; UCI Complejo Asistencial Universitario de Palencia (Palencia).


UCI Hospital Universitario de Ceuta: Enrique Laza Laza.


UCI Hospital San Pedro de Alcántara (Cáceres): Elena Gallego Curto, Mª Carmen Sánchez García; UCI Hospital de Mérida(Mérida): Miguel Díaz-Tavora, Rosa Mancha.


UCI Hospital Montecelo (Pontevedra): Ana Ortega Montes, Isabel Gallego Barbachano, Eva Sanmartín Mantiñán. UCI CHUAC A Coruña (A Coruña): María Lourdes Cordero, Raquel María Rodríguez García, Jorge Gámez Zapata, María Gestal Vázquez. UCI Centro Hospital Universitario de Ferrol (Ferrol): María José Castro Orjales, María Isabel Álvarez Diéguez. UCI Hospitalario Clínico Universitario de Santiago (Santiago de Compostela): Carmen Rivero Velasco, Beatriz Lence Massa; REA CHUAC A Coruña (A Coruña): María Gestal Vázquez.UCI Hospital Lucus Augusti (Lugo): Ignacio Martínez Varela.


UCI Hospital Infanta Elena (Huelva): Diego Matallana Zapata.


UCI IFEMA (Madrid): Alberto Hernández Tejedor; UCI Hospital Príncipe de Asturias (Madrid): Esther Mª López Ramos, Laura Alcázar Sánchez Elvira, Rocío Molina Montero, Mª Consuelo Pintado Delgado, María Trascasa Muñoz de la Peña, Yaiza Betania Ortiz de Zárate Ansotegui, Alejandra Acha Aranda, Juan Higuera Lucas; UCI Hospital de la Princesa (Madrid): Juan Antonio Sanchez Giralt, Marta Chicot Llano, Nuria Arevalillo Fernández, Marta Sánchez Galindo, Ricardo Andino Ruiz, Alfonso Canabal Berlanga; UCI Hospital Clinico San Carlos (Madrid): Miguel Sánchez, Mercedes Nieto; UCI Hospital HLA la Moncloa(Madrid): Eduardo Arias Sarmiento, Adoración Bueno Blázquez, Rosa María de la Casa, Fátima Martín, Samuel González López.


UCI Hospital Morales Meseguer (Murcia): Elena Martínez Quintana, Bernardo Gil Rueda, Áurea Higon Cañigral, Laura López Gómez, Pablo Safwat Bayoumi Delis, Augusto Montenegro Muore, Ángel Andrés Agamez Luengas; UCI Hospital Clínico Universitario Virgen de la Arrixaca (Murcia): Enriqueta Andreu Soler, Ana Beatriz Pérez Pérez, José Higinio de Gea García, Rubén Jara Rubio, Silvia Sánchez Cámara, Alba Moreno Flores, José Moya Sánchez, Daniel Francisco Pérez Martínez,-Mª Desamparados del Rey Carrión; UCI Hospital Reina Sofía (Murcia): María José Rico Lledó, Juana María Serrano Navarro, Juan Francisco Martín Ruíz, Julián Triviño Hidalgo, África López Ferrer, Isabel Cremades Navalón; UCI Hospital Santa Lucía (Cartagena): Josefa Murcia Payá, JM Allegre Gallego; UCI Hospital Rafael Méndez (Lorca): María del Carmen Lorente; UCI Hospital Universitario Mar Menor (San Javier):Marta Gonsalvez.


UCI Hospital Reina Sofía (Tudela): Ruth González Natera, Raquel Garrido López de Murillo, Tania Ojuel Gros,Raquel Flecha Viguera, Isabel López González; UCI Hospital García Orcoyen(Estella-Lizarra): Adriana García Herrera.

País Vasco

UCI Hospital Universitario de Donostia (Donostia): Loreto Vidaur Tello, Maialen Aseguinolaza, Itziar Eguibar.


UCI Hospital Universitario Virgen de la Macarena(Sevilla): María Luisa Cantón Bulnes, Jose Javier González Contreras, Helena Pérez Chomón, Nerissa Álvarez Chicote, Alberto Sousa González.


UCI Hospital Universitario de La Ribera (Alzira): Asunción Marqués Parra, Sergio García Marti, Alberto Lorenzo Aguilar, Laura Bellver Bosch, Victor Gascón Sanchez, Sonia De la Guía Ortega. UCI Hospital Dr. Peset (Valencia): Martín Parejo Montell, Alberto Belenguer Muncharaz, Hector Hernández Garces, Victor Ramírez Montero, Mónica Crespo Gómez, Verónica Martí Algarra; UCI Hospital Universitari i Politècnic La Fe (Valencia): Susana Sancho Chinesta, Joaquin Arguedas Cervera, Faustino Álvarez Cebrian, Begoña Balerdi Pérez, Rosa Jannone Fores, Javier Botella de Maglia; UCI Hospital Clínico Universitario de Valencia (Valencia): Nieves Carbonell Monleón, Jose Ferreres Franco, Ainhoa Serrano Lazaro, Mar Juan Díaz, María Luisa Blasco Cortés; UCI Hospital Virgen de los Lirios de Alcoy (Alicante): Laura Fayos, Julia Giménez, Gaspar Soriano, Ricardo Navarro. UCI Hospital Arnau de Vilanova (Valencia): Sonia Mas, Elena Bisbal, Laura Albert, Johncard Romero, Juan Fernández Cabreara; UCI Hospital Comarcal de Vinarós (Vinarós): Andrea Ortíz.

Principado de Andorra

ICU Hospital Nostra Señyora de Meritxell (Les Esclades): Antonio Margarit Ribas.


This study was supported by the Spanish Intensive Care Society (SEMICYUC) and Ricardo Barri Casanovas Foundation. The study sponsors have no role in the study design, data collection, data analysis, data interpretation, or writing of the report.

Author information





AR, MRB, IML, MJH, JSV, JG, MB, ST, EP, ED, BS, MV, EMV, AAM, JASG, MDVO, JCB, LMI, JMC, ELR, VHV, LV, SSC, FJGM, SHG, CCSP, JCPL, and RRG had substantial contributions to conception and design of the work. AR, ST, MJH, MRB and JG had substantial contribution for data acquisition. AR, MRB, IML, JG, MB had substantial contribution for data analysis and interpretation of data for the work. AR, IML, MB, JSV and ED drafting of the manuscript. RF, IML, MB, ED, BS, JCB, SSC, FJGDM critically reviewed the draft manuscript. The corresponding author (AR) had full access to all the data in the study and takes responsibility for the integrity of the data and the accuracy of the data analysis. All authors approved the final version of the manuscript. The views expressed in this article are those of the authors and not necessarily those of the SEMICYUC.

Corresponding author

Correspondence to Alejandro Rodríguez.

Ethics declarations

Ethical approval and consent to participate

The study was approved by the reference institutional review board at Joan XXIII University Hospital (IRB# CEIM/066/2020) and each participating site with a waiver of informed consent.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

. Supplementary online content.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Rodríguez, A., Ruiz-Botella, M., Martín-Loeches, I. et al. Deploying unsupervised clustering analysis to derive clinical phenotypes and risk factors associated with mortality risk in 2022 critically ill patients with COVID-19 in Spain. Crit Care 25, 63 (2021).

Download citation


  • Severe SARS-CoV-2 infection
  • Phenotypes
  • Risk factors
  • Prognosis
  • Machine learning