Skip to main content

Interpretable machine learning models for predicting venous thromboembolism in the intensive care unit: an analysis based on data from 207 centers

Matters Arising to this article was published on 18 December 2023



Venous thromboembolism (VTE) is a severe complication in critically ill patients, often resulting in death and long-term disability and is one of the major contributors to the global burden of disease. This study aimed to construct an interpretable machine learning (ML) model for predicting VTE in critically ill patients based on clinical features and laboratory indicators.


Data for this study were extracted from the eICU Collaborative Research Database (version 2.0). A stepwise logistic regression model was used to select the predictors that were eventually included in the model. The random forest, extreme gradient boosting (XGBoost) and support vector machine algorithms were used to construct the model using fivefold cross-validation. The area under curve (AUC), accuracy, no information rate, balanced accuracy, kappa, sensitivity, specificity, precision, and F1 score were used to assess the model's performance. In addition, the DALEX package was used to improve the interpretability of the final model.


This study ultimately included 109,044 patients, of which 1647 (1.5%) had VTE during ICU hospitalization. Among the three models, the Random Forest model (AUC: 0.9378; Accuracy: 0.9958; Kappa: 0.8371; Precision: 0.9095; F1 score: 0.8393; Sensitivity: 0.7791; Specificity: 0.9989) performed the best.


ML models can be a reliable tool for predicting VTE in critically ill patients. Among all the models we had constructed, the random forest model was the most effective model that helps the user identify patients at high risk of VTE early so that early intervention can be implemented to reduce the burden of VTE on the patients.


Venous thromboembolism (VTE), which includes deep vein thrombosis (DVT) and pulmonary embolism (PE), is a chronic disease that frequently recurs. About 30% of patients with VTE are estimated to recur within ten years [1, 2]. VTE often leads to patient death, long-term disability, and bleeding associated with anticoagulation therapy and is one of the major contributors to the global burden of disease [3]. Although PE-related mortality has decreased yearly, nearly 10% of PE patients die within 30 days of diagnosis [4]. In addition, VTE carries a significant economic burden. The US healthcare system spends $7–10 billion annually related to VTE events, and Europe spends €1.5–3.3 billion [5, 6]. Critically ill patients are at much greater risk of VTE than medically hospitalized patients. Critically ill patients face general risk factors for VTE, including factors like age, obesity, a prior history of VTE, and cancer. Moreover, they are also susceptible to ICU-specific risk factors such as immobilization, the use of central venous catheters (CVC), and mechanical ventilation [7,8,9]. Although anticoagulants are clinically given to critically ill patients to prevent thrombosis, the incidence of VTE in critically ill patients is still high [10]. Therefore, identifying patients at high risk of VTE through risk assessment models can help in early prevention and timely treatment.

Machine Learning (ML) is the discipline in which computers use algorithms to learn from data and can recognize underlying patterns in the data. ML has powerful computational and data-fitting capabilities to find complex relationships between large amounts of data. These features make ML well-suited for complex clinical datasets, and its use in clinical research is increasing yearly [11]. In previous studies, the ML model demonstrated excellent performance [12, 13]. While the performance of ML models is excellent, the black-box (i.e., data goes in, decisions come out, and inputs to outputs are opaque) nature of ML similarly limits its application [14, 15]. Therefore, understanding why and how models make decisions is critical to using models in clinical practice. Algorithms for interpreting ML models have recently emerged, and these algorithms can increase users' understanding and trust in ML models [16].

In this paper, we report the development of an ML model for predicting VTE in critically ill patients. We also used an interpretable algorithm for the ML model to interpret the predictions of the model.


Data source and population

Data for this study were extracted from the eICU Collaborative Research Database (version 2.0) [17]. The database is a multicenter, publicly available ICU database containing de-identified, high-granularity medical data on 200,859 ICU admissions from 208 centers across the United States from 2014 to 2015 [18]. The eICU Collaborative Research Database included vital signs, care plan documentation, disease severity measures, diagnoses, treatments, and laboratory results recorded by care providers during a patient's ICU stay. This study's data extractor and processor was granted a license to use the data (certification number: 11678655). Informed consent was waived due to the de-identified nature of the data.

In this study, all patients aged greater than or equal to 18 years we considered for inclusion, and for patients with multiple ICU admissions, only the first admission was considered. Exclusion criteria were as follows:(1) ICU stay of less than 24 h; (2) VTE as an admission diagnosis; (3) diagnosis of VTE within 24 h of ICU admission; and (4) individual data missing greater than 30%. The flowchart for study cohort selection is shown in Fig. 1.

Fig. 1
figure 1

Flowchart of patient selection. Abbreviations: VTE, venous thromboembolism; ICU, intensive care unit

Feature extraction

Baseline information was extracted using Structured Query Language (SQL) for the 24 h following the patient's admission to the ICU. Demographic information included age, gender, body mass index (BMI), Acute Physiology and Chronic Health Evaluation IV score (APACHE IV score), previous history of VTE, history of cancer, and Glasgow Coma Scale (GCS). Laboratory parameters included hematocrit, hemoglobin, platelet count, white blood cell count, albumin, blood urea nitrogen (BUN), serum creatinine, international normalized ratio (INR), prothrombin time (PT), partial thromboplastin time (PTT), total bilirubin, alanine aminotransferase (ALT), and aspartate transaminase (AST). The treatment received by the patient on the first day of admission to the ICU included mechanical ventilation, CVC, vasopressin, sedative, transfusion of fresh frozen plasma, platelet transfusion, transfusion of packed red blood cells, pharmacologic prophylaxis, and graduated compression stockings. The principal diagnosis on admission included cardiovascular condition, respiratory condition, gastrointestinal condition, renal condition, neurologic condition, metabolic condition, trauma, and other conditions. Diseases that patients suffer from included cancer, respiratory failure, heart failure, end-stage renal disease (ESRD), and sepsis. We selected the maximum value for variables measured multiple times in 24 h. We used multiple interpolation to interpolate missing values [19]. Multiple interpolation generated multiple complete datasets by fitting the possible values of the missing data multiple times through the model. Afterwards, multiple interpolation analyzed the generated datasets and combined the results of multiple analyses to finally obtain a comprehensive estimate and statistical inference. In contrast to single interpolation, multiple interpolation filled in missing values multiple times, which quantified the uncertainty in estimating missing values and avoided generating incorrect accuracies [20]. Details on missing values were available in Additional file 1: Fig. S1.


The primary outcome of this study was new VTE during ICU hospitalization, including DVT, PE, or both.

Statistical analysis

Depending on whether or not it conformed to a normal distribution, continuous variables were presented as mean (standard deviation) or median (quartiles 1–3). Categorical variables were described as frequencies (percentages). We compared the clinical characteristics of the VTE and non-VTE groups using the Student t-test for normally distributed continuous variables and the Mann–Whitney U test for non-normally distributed ones. Differences in categorical variables were compared using the χ2 test or Fisher's precision probability test. A two-sided P value < 0.05 was regarded as statistically significant. A stepwise logistic regression model was used to select the predictors that were ultimately included in the model. Akaike Information Criterion (AIC) was used as a selection criterion for stepwise feature selection [21]. We calculated the AIC at each step while using forward selection and backward elimination of predictor variables, stopping when further addition or removal of variables no longer improved the AIC, thus obtaining the model with the lowest AIC.

In addition, we used the DALEX package to improve the interpretability of the final model [16]. The DALEX package contains various explainers that help understand the relationship between input variables and model outputs. The DALEX package allows us to understand the importance of the variables in the model, the relationship between the variables and the clinical outcomes, and assess each variable's contribution to individual predictions.

Study design

The eICU Collaborative Research Database used in this study is a multicenter database of 208 hospitals. We used hospitals as the basic unit and randomly selected hospitals containing about 70% of the patients in the final cohort as the training set and the remaining hospitals containing about 30% as the validation set for external validation of the model. We described the hospital IDs included in the training and validation sets in Additional file 1: Table S1 and described the demographic and clinical characteristics of the training and validation sets in Additional file 1: Table S2. Since our data was characterized by class imbalance, high dimensionality and large sample size, we selected from common machine learning algorithms that are more suitable for our data. We finally chose random forest, extreme gradient boosting (XGBoost), and support vector machine (SVM) algorithms for model construction and tuned the hyperparameters using a randomized search strategy with five-fold cross-validation. Five-fold cross-validation means dividing the dataset into five mutually exclusive subsets, each acting as a fold. Four folds were used in each round as the training set, leaving one fold as the test set. Repeat this process five times, ensuring each fold has acted as a test set. Cross-validation reduces model overfitting and improves robustness. For imbalanced data, machine learning models may tend to favor the dominant class while neglecting the minority class. To address data imbalance, we adjusted the classification threshold. Typically, the model's default classification threshold is set at 0.5, meaning that a sample is classified as the positive class when the model's output probability is greater than 0.5 and as the negative class otherwise. However, in the case of class imbalance, this default threshold may not be the optimal choice. After considering various model performance metrics, we ultimately selected a threshold of 0.2. This means that when the model's output probability is greater than 0.2, it predicts a positive result; otherwise, it predicts a negative result. The area under curve (AUC) of the receiver operating characteristic (ROC) curve, accuracy, no information rate, balanced accuracy, kappa, sensitivity, specificity, precision, and F1 scores were used to assess the performance of the models. This study's statistical analysis and model construction were based on R version 4.3.0.


Baseline characteristics

A total of 109,044 patients were enrolled in the cohort of this study, with 72,742 patients in the training set and 36,302 patients in the validation set. We divided the patients into VTE and non-VTE groups based on whether VTE occurred during ICU hospitalization, with 1647 (1.5%) patients in the VTE group and 107,397 (98.5%) patients in the non-VTE group. Baseline differences between the VTE and non-VTE groups were shown in Table 1. Patients who developed VTE during their ICU stay had a higher BMI than the non-VTE group. Previous history of VTE and history of cancer were higher in the VTE group than in the non-VTE group. In the VTE group, ICU admissions for respiratory disease and sepsis were higher than in the non-VTE group. Compared with the non-VTE group, the VTE group had higher prevalence of cancer, respiratory failure, heart failure, and sepsis; and higher rates of mechanical ventilation, CVC, use of vasopressors, and transfusion of fresh frozen plasma and packed red blood cells. The maximum values of platelet count, white blood cell count, BUN, serum creatinine, total bilirubin, ALT, and AST were higher in the VTE group than in the non-VTE group. In addition, the proportion of pharmacologic prevention of VTE was slightly higher in the VTE group than in the non-VTE group. In contrast, the proportion of mechanical prevention was not significantly different between the two groups.

Table 1 Demographic and clinical characteristics between VTE and non-VTE group

Feature selection and model performance comparisons

We collected a total of 43 clinical and biological variables within 24 h of the patient's ICU admission. Through stepwise logistic regression, we finally selected 24 variables, which were age, gender, BMI, previous history of VTE, history of cancer, cancer, respiratory failure, heart failure, sepsis, hematocrit, hemoglobin, platelet count, white blood cell count, albumin, serum creatinine, INR, PT, PTT, total bilirubin, ALT, AST, transfusion of packed red blood cells, mechanical ventilation, and CVC.

Random forest, XGBoost and SVM algorithms were used to construct models. The fivefold cross-validated random search strategy resulted in the finalization of the hyperparameters for fandom forest as mtry = 12; for XGBoost as nrounds = 46, lambda = 0.0002833363, alpha = 0.1278563 and eta = 0.3265631; and SVM as sigma = 0.03212153 and C = 0.1837905. We used AUC, accuracy, no information rate, balanced accuracy, kappa, sensitivity, specificity, precision, and F1 scores to comprehensively evaluate the model's performance. XGBoost had the largest AUC (0.9492) and sensitivity (0.7810), followed by random forest (AUC: 0.9378; sensitivity: 0.7791) and SVM (AUC: 0.8290; sensitivity: 0.5911) (Table 2). Figure 2 described the ROC curves for the three models. The accuracy, kappa, specificity, precision and F1 scores of random forest were higher than those of XGBoost and SVM, as shown in Table 1. Compared to the random forest, XGBoost had a higher sensitivity, i.e., the number of underreporting (false negatives) was slightly lower in the XGBoost model than in the random forest model. However, the precision of the random forest was higher than XGBoost, i.e., the number of false positives (false positives) was lower in the random forest model than in the XGBoost model. The random forest had better clinical utility compared to XGBoost and SVM.

Table 2 Performance of three machine learning models for predicting VTE in critically ill patients
Fig. 2
figure 2

Receiver operating characteristic curves of the three models for predicting VTE. RF, random forest; XGB, eXtreme gradient boosting; SVM, support vector machine


We calculated feature importance using the DALEX package and showed the top 20 clinical variables in terms of importance in Fig. 3. In Additional file 1: Figs. S2–6, we also described the effect (positive or negative) of clinical characteristics on the model. Characteristics associated with increased incidence of VTE were higher age, BMI, platelet count, white blood cell count, serum creatinine, ALT, AST, and total bilirubin. And lower PTT, PT, and INR were associated with an increased incidence of VTE. In addition, a history of prior VTE, a diagnosis of cancers, heart failure, respiratory failure, sepsis, and treatment with CVC, mechanical ventilation, and transfusion of packed red blood cells were also helpful in predicting VTE. Gender and cancer history were not strongly associated with VTE prediction. We also found that albumin, hematocrit, and hemoglobin were associated with an increased risk of VTE in a U-shaped curve. We named the final model Alfalfa-ICU-VTE (“Alfalfa” is the name of our team, representing happiness and luck).

Fig. 3
figure 3

Feature importance derived from random forest model. This figure is the result of the DALEX package. The X-axis represents the loss in AUC calculated after randomly permuting the feature compared to the original AUC. The greater this loss, the higher the model's importance of this feature. Abbreviations: PTT, partial thromboplastin time; AST, aspartate transaminase; PT, prothrombin time; INR, international standard ratio; BMI, body mass index; ALT, alanine aminotransferase; WBC, white blood cell; CVC, central venous catheter


In this study, based on 24 variables collected within 24 h of ICU admission, we developed three ML models to provide individual predictions of whether VTE occurs in critically ill patients during their ICU stay. The random forest model demonstrated the best performance. Through feature importance analysis, we identified the 20 clinical variables that had the greatest impact on the prediction of VTE, in descending order of importance: PTT, AST, history of previous VTE, platelet count, respiratory failure, total bilirubin, hemoglobin, PT, INR, BMI, sepsis, serum creatinine, ALT, cancer, white blood cell count, CVC, mechanical ventilation, heart failure, history of cancer, and gender. In addition, we described how these variables affected the random forest model. Finally, through the interpretable algorithm of the ML model, we learned how the model obtained individual case predictions.

In many previous studies, ML models have shown excellent performance, but these models suffer from a lack of interpretability, i.e., these models were black boxes. Users can input data to obtain outputs, and it was unclear how the model generates predictions, which limited the use of ML models in clinical settings. Even if the model has demonstrated a high degree of accuracy, the lack of understanding of why and how the model makes predictions inevitably causes concerns when clinicians want to treat or prevent patients based on the model's predictions. Similarly, patient cooperation will be poor if the physician doesn't understand why the algorithm is making predictions. Especially in complex cases with significant healthcare consequences, the black-box nature of ML models will greatly hinder their application. The 2018 European General Data Protection Regulation stated that when using ML algorithms for decision-making, individuals have the right to obtain meaningful information about the logic involved as well as the implications and expected consequences of such processing. The regulation conveyed concerns about the opaque predictions of ML models [22, 23]. The interpretable ML models we built help users better understand the decision-making process of the models, thus making them more reliable and transparent. Our model also provided insights into the contribution of predictor variables to individual predicted outcomes, aiding caregivers in the development of more flexible care plans tailored to specific patient conditions. Furthermore, our model effectively identified patients at high risk of thrombosis, allowing for the prioritization of limited healthcare resources towards those requiring special attention, thereby optimizing resource allocation. Simultaneously, this approach assisted in alleviating the financial burden on patients, particularly those in less favorable financial situations. By strengthening the monitoring of high-risk thrombosis patients, it was conducive to early detection and treatment of thrombosis, reducing its impact on patients.

To further explore the contribution of these clinical features to individual patient predictive outcomes, we randomly selected four patients from the validation cohort for presentation. With interpretable algorithms, we can visualize which clinical indicators in a given patient increased the prediction of VTE and which variables decreased the prediction. We showed one of these patients in the main text, and the remaining three are available in the supplemental material (Additional file 1: Figs. S7–9). This patient was a 76-year-old male with a BMI of 42.1. He had a history of previous VTE but no history of cancer, and he presented with respiratory failure. Laboratory markers on the first day of ICU admission showed a hematocrit of 43.3%, hemoglobin level of 15.4 g/dl, platelet count of 115 K/uL, white blood cell count of 14.2 K/uL, albumin level of 2 g/dL. His serum creatinine was 1.6 mg/dL, INR was 0.8, PT was 10 s, PTT was 29.7 s, total bilirubin was 0.7 mg/dL, ALT was 42 U/L, and AST was 29 U/L. He received mechanical ventilation treatment and did not require a transfusion of packed red blood cells or CVC (Fig. 4). The ML model predicted a 29.8% risk of VTE based on the patient's clinical characteristics within 24 h of admission to the ICU, with serum creatinine, hemoglobin, comorbid respiratory failure, history of previous VTE, and PT being the top five contributors to the increased risk of VTE, whereas age reduced the model's prediction of VTE. The predicted outcome of the ML model was that the patient had a VTE, and the actual outcome was that the patient had a VTE while in the ICU (true positive).

Fig. 4
figure 4

Explaining of patient prediction results. This figure was made with the DALEX package for explaining random forest model predictions. Abbreviations: WBC, while blood cells; BMI, body mass index; PT, prothrombin time; ALT, alanine aminotransferase; AST, aspartate transaminase

Our findings indicated that lower PTT, PT and INR were associated with an increased risk of VTE in critically ill patients. PTT was a blood test that characterizes blood coagulation and was related to the intrinsic and common pathways of coagulation. Several population-specific studies have also shown that low levels of PTT were associated with an increased risk of VTE [24, 25]. Lower PTT may be due to increased coagulation factor activity in the intrinsic or common pathway or resistance to activated protein C, increasing the risk of thrombosis [24, 25]. PT was another coagulation test to assess tissue factors and common coagulation pathways. Lower levels of PT were associated with an increased risk of VTE, possibly due to increased activity of coagulation factors II, V, VII, X and fibrinogen [26]. INR was a mathematical conversion form of PT and was related to VTE similarly to PT.

Our findings showed that higher ALT, AST and total bilirubin were associated with an increased risk of VTE. In some previous studies, researchers observed that abnormal liver function may increase the incidence of thrombosis in patients, which was similar to our findings [27,28,29,30]. Coagulation factor VIII is one of the most potent drivers of thrombin generation, and the increased risk of thrombosis in patients with abnormal hepatic function may be associated with significantly elevated plasma levels of coagulation factor VIII [31]. In patients with hepatic insufficiency, high levels of von Willebrand factor and underexpressed low-density lipoprotein receptor-associated protein together maintain high plasma levels of factor VIII [32, 33]. Von Willebrand factor binds to factor VIII and protects it from cleavage and premature clearance by plasma proteases [34]. Low-density lipoprotein receptor-associated protein mediates cellular uptake and degradation of factor VIII [35].

Like a previous study that prospectively explored risk factors for VTE in ICU patients, our results showed that critically ill patients with a history of VTE were at higher risk for VTE, reaffirming that VTE was a relapsing disease [7]. In addition, we found that critically ill patients with comorbid cancers were more likely to develop VTE. Cancer patients are often in a hypercoagulable state. The presence of cancer tends to activate the coagulation cascade, promote platelet activation, and increase the aggregation status of blood cells, such as platelets and leukocytes [36]. In addition, cancer treatments such as chemotherapy and targeted therapies may promote thrombosis through mechanisms that are not fully understood [37, 38]. The findings also pointed to sepsis as similarly increasing the risk of VTE. Sepsis is a syndrome of the systemic inflammatory response caused by infection, and inflammation is considered a common pathway for VTE formation triggered by many risk factors. Inflammation of the vessel wall induces thrombosis, and the inflammatory and coagulation systems are coupled through common activation pathways [39]. The systemic inflammatory response induced by sepsis leads to activation and depletion of coagulation factors and platelets, impaired fibrinolytic function, disruption of the vascular endothelial barrier, and loss of physiologic antithrombotic factors such as thrombomodulin [40]. Our findings also showed that respiratory and heart failure were risk factors for VTE, which was similar to the results of previous studies [41, 42].

We also found that receiving CVC and mechanical ventilation increased the risk of VTE. CVC and mechanical ventilation are frequently used in the ICU as important therapeutic measures to maintain vital signs in critically ill patients. Still, their presence also puts critically ill patients at increased risk of thrombosis. When CVC is exposed to the bloodstream due to the lack of a normal endothelial layer of the blood vessel wall, CVC cannot inhibit platelet adhesion and coagulation. Therefore, in some cases, CVC activates the contact pathway, ultimately leading to thrombosis [43]. Decreased venous return and restricted mobility due to increased intrathoracic pressure in patients undergoing mechanical ventilation may be responsible for the increased risk of thrombosis [9]. In an accompanying clinical trial, researchers found that mechanical ventilation led to pulmonary and systemic coagulation disorders in patients, which may be another reason mechanical ventilation increases the risk of thrombosis [44]. A previous retrospective cohort study also suggested that mechanical ventilation is an independent risk factor for VTE [45]. Furthermore, it should be noted that mechanically ventilated patients often require lung scans, which may increase the rate of PE diagnosis.

This study has some strengths and weaknesses. We used advanced ML techniques for modeling. The powerful computational and fitting capabilities of ML algorithms enable the construction of complex models. In addition, we used the DALEX package to explain the decision-making process of the ML model, helping clinical users better understand the model's predictive process. Moreover, our study included 109,044 patients from 207 centers, giving our model some generalizability. The limitation of this study was that it was retrospective, and inevitably there will be some bias. Second, the study lacked validation in prospective clinical trials to determine the exact performance of the model in the real world. Thirdly, the interpolation values generated by the multiple interpolation method were based on the estimation of the statistical model, and thus there was an estimation error. This meant that the interpolated values may have some deviation from the true values, which may have some impact on the performance of the machine learning model. Finally, immobilization is one of the important risk factors for VTE, yet this clinical variable was not included in our model. Although information on immobilization after ICU admission was available in the eICU database, it was not available before ICU admission. We planned to use the clinical characteristics of patients within 24 h of ICU admission for prediction and therefore did not include immobilization in the predictor variables. Subsequently, we will integrate the model into a web page to make it easy to use as an online tool. Additionally, we have devised plans to integrate the model with the hospital's case management system, automating the assessment of patients' VTE risk. We will then embark on a prospective study within our hospital to validate the model's performance in real-world scenarios. Depending on the model's performance within our single-center setting, we will contemplate its extension for prospective multicenter validation.


ML modeling can be a reliable tool for predicting VTE in critically ill patients. Among all the models we have constructed, the random forest model was the most effective model that helped the user identify patients at high risk of VTE early so that early intervention can be implemented to reduce the burden of VTE on the patients.

Availability of data and materials

The datasets presented in the current study are available in the eICU Collaborative Research Database (version 2.0) ( Though datasets are de-identifed, restrictions have been imposed on data sharing since they contain sensitive information. Before accessing the data, the researcher must sign the relevant convention. To access the data, interested researchers must meet all of the following requirements: be a credentialed user of, finish required training and sign the data use agreement for the project. All the code used for this project is available on github (


  1. Heit JA. Epidemiology of venous thromboembolism. Nat Rev Cardiol. 2015;12(8):464–74.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Kearon C. Natural history of venous thromboembolism. Circulation. 2003;107(23 Suppl 1):I22–30.

    Article  PubMed  Google Scholar 

  3. Wendelboe AM, Raskob GE. Global burden of thrombosis: epidemiologic aspects. Circ Res. 2016;118(9):1340–7.

    Article  PubMed  CAS  Google Scholar 

  4. Bikdeli B, Wang Y, Jimenez D, et al. Pulmonary embolism hospitalization, readmission, and mortality rates in US older adults, 1999–2015. JAMA. 2019;322(6):574–6.

    Article  PubMed  PubMed Central  Google Scholar 

  5. Grosse SD, Nelson RE, Nyarko KA, Richardson LC, Raskob GE. The economic burden of incident venous thromboembolism in the United States: a review of estimated attributable healthcare costs. Thromb Res. 2016;137:3–10.

    Article  PubMed  CAS  Google Scholar 

  6. Barco S, Woersching AL, Spyropoulos AC, Piovella F, Mahan CE. European Union-28: an annualised cost-of-illness model for venous thromboembolism. Thromb Haemost. 2016;115(4):800–8.

    Article  PubMed  Google Scholar 

  7. Cook D, Crowther M, Meade M, et al. Deep venous thrombosis in medical-surgical critically ill patients: prevalence, incidence, and risk factors. Crit Care Med. 2005;33(7):1565–71.

    Article  PubMed  Google Scholar 

  8. Minet C, Lugosi M, Savoye PY, et al. Pulmonary embolism in mechanically ventilated patients requiring computed tomography: prevalence, risk factors, and outcome. Crit Care Med. 2012;40(12):3202–8.

    Article  PubMed  Google Scholar 

  9. Minet C, Potton L, Bonadona A, et al. Venous thromboembolism in the ICU: main characteristics, diagnosis and thromboprophylaxis. Crit Care. 2015;19(1):287.

    Article  PubMed  PubMed Central  Google Scholar 

  10. PROTECT Investigators for the Canadian Critical Care Trials Group and the Australian and New Zealand Intensive Care Society Clinical Trials Group, Cook D, Meade M, et al. Dalteparin versus unfractionated heparin in critically ill patients. N Engl J Med. 2011;364(14):1305–1314.

  11. Handelman GS, Kok HK, Chandra RV, Razavi AH, Lee MJ, Asadi H. eDoctor: machine learning and the future of medicine. J Intern Med. 2018;284(6):603–19.

    Article  PubMed  CAS  Google Scholar 

  12. Zhang Z, Ho KM, Hong Y. Machine learning for the prediction of volume responsiveness in patients with oliguric acute kidney injury in critical care. Crit Care. 2019;23(1):112.

    Article  PubMed  PubMed Central  Google Scholar 

  13. Thorsen-Meyer HC, Nielsen AB, Nielsen AP, et al. Dynamic and explainable machine learning prediction of mortality in patients in the intensive care unit: a retrospective study of high-frequency data in electronic patient records. Lancet Digit Health. 2020;2(4):e179–91.

    Article  PubMed  Google Scholar 

  14. Watson DS, Krutzinna J, Bruce IN, et al. Clinical applications of machine learning algorithms: beyond the black box. BMJ. 2019;364:l886.

    Article  PubMed  Google Scholar 

  15. Medicine TLR. Opening the black box of machine learning. Lancet Respir Med. 2018;6(11):801.

    Article  Google Scholar 

  16. Biecek P. DALEX: explainers for complex predictive models in R. J Mach Learn Res. 2018;19(1):3245–9.

    Google Scholar 

  17. Pollard T, Johnson A, Raffa J, Celi LA, Badawi O, Mark R. eICU collaborative research database (version 2.0). PhysioNet (2019).

  18. Pollard TJ, Johnson AEW, Raffa JD, Celi LA, Mark RG, Badawi O. The eICU Collaborative Research Database, a freely available multi-center database for critical care research. Sci Data. 2018;5:180178.

    Article  PubMed  PubMed Central  Google Scholar 

  19. Zhang Z. Multiple imputation with multivariate imputation by chained equation (MICE) package. Ann Transl Med. 2016;4(2):30.

    Article  PubMed  PubMed Central  Google Scholar 

  20. Li P, Stuart EA, Allison DB. Multiple imputation: a flexible tool for handling missing data. JAMA. 2015;314(18):1966–7.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  21. Vrieze SI. Model selection and psychological theory: a discussion of the differences between the Akaike information criterion (AIC) and the Bayesian information criterion (BIC). Psychol Methods. 2012;17(2):228–43.

    Article  PubMed  PubMed Central  Google Scholar 

  22. Regulation P. Regulation (EU) 2016/679 of the European parliament and of the council. Regulation (EU). 2016;679:2016.

    Google Scholar 

  23. Cohen IG, Amarasingham R, Shah A, Xie B, Lo B. The legal and ethical concerns that arise from using complex predictive analytics in health care. Health Aff (Millwood). 2014;33(7):1139–47.

    Article  PubMed  Google Scholar 

  24. Aboud MR, Ma DD. Increased incidence of venous thrombosis in patients with shortened activated partial thromboplastin times and low ratios for activated protein C resistance. Clin Lab Haematol. 2001;23(6):411–6.

    Article  PubMed  CAS  Google Scholar 

  25. Tripodi A, Chantarangkul V, Martinelli I, Bucciarelli P, Mannucci PM. A shortened activated partial thromboplastin time is associated with the risk of venous thromboembolism. Blood. 2004;104(12):3631–4.

    Article  PubMed  CAS  Google Scholar 

  26. Dorgalaleh A, Daneshi M, Rashidpanah J, Roshani Yasaghi E. An overview of hemostasis. In: Dorgalaleh A, editor. Congenital bleeding disorders. Cham: Springer; 2018.

    Chapter  Google Scholar 

  27. Okuda K, Ohnishi K, Kimura K, et al. Incidence of portal vein thrombosis in liver cirrhosis. An angiographic study in 708 patients. Gastroenterology. 1985;89(2):279–86.

    Article  PubMed  CAS  Google Scholar 

  28. Francoz C, Belghiti J, Vilgrain V, et al. Splanchnic vein thrombosis in candidates for liver transplantation: usefulness of screening and anticoagulation. Gut. 2005;54(5):691–7.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  29. Gulley D, Teal E, Suvannasankha A, Chalasani N, Liangpunsakul S. Deep vein thrombosis and pulmonary embolism in cirrhosis patients. Dig Dis Sci. 2008;53(11):3012–7.

    Article  PubMed  Google Scholar 

  30. Søgaard KK, Horváth-Puhó E, Grønbaek H, Jepsen P, Vilstrup H, Sørensen HT. Risk of venous thromboembolism in patients with liver disease: a nationwide population-based case-control study. Am J Gastroenterol. 2009;104(1):96–101.

    Article  PubMed  Google Scholar 

  31. Tripodi A, Primignani M, Chantarangkul V, et al. An imbalance of pro- vs anti-coagulation factors in plasma from patients with cirrhosis. Gastroenterology. 2009;137(6):2105–11.

    Article  PubMed  CAS  Google Scholar 

  32. Lisman T, Bongers TN, Adelmeijer J, et al. Elevated levels of von Willebrand Factor in cirrhosis support platelet adhesion despite reduced functional capacity. Hepatology. 2006;44(1):53–61.

    Article  PubMed  CAS  Google Scholar 

  33. Hollestelle MJ, Geertzen HG, Straatsburg IH, van Gulik TM, van Mourik JA. Factor VIII expression in liver disease. Thromb Haemost. 2004;91(2):267–75.

    Article  PubMed  CAS  Google Scholar 

  34. Lenting PJ, van Mourik JA, Mertens K. The life cycle of coagulation factor VIII in view of its structure and function. Blood. 1998;92(11):3983–96.

    Article  PubMed  CAS  Google Scholar 

  35. Saenko EL, Yakhyaev AV, Mikhailenko I, Strickland DK, Sarafanov AG. Role of the low density lipoprotein-related protein receptor in mediation of factor VIII catabolism. J Biol Chem. 1999;274(53):37685–92.

    Article  PubMed  CAS  Google Scholar 

  36. Falanga A, Russo L, Milesi V, Vignoli A. Mechanisms and risk factors of thrombosis in cancer. Crit Rev Oncol Hematol. 2017;118:79–83.

    Article  PubMed  Google Scholar 

  37. Grover SP, Hisada YM, Kasthuri RS, Reeves BN, Mackman N. Cancer therapy-associated thrombosis. Arterioscler Thromb Vasc Biol. 2021;41(4):1291–305.

    Article  PubMed  PubMed Central  CAS  Google Scholar 

  38. Falanga A, Marchetti M. Anticancer treatment and thrombosis. Thromb Res. 2012;129(3):353–9.

    Article  PubMed  CAS  Google Scholar 

  39. Branchford BR, Carpenter SL. The Role of Inflammation in Venous Thromboembolism. Front Pediatr. 2018;6:142.

    Article  PubMed  PubMed Central  Google Scholar 

  40. Foley JH, Conway EM. Cross talk pathways between coagulation and inflammation. Circ Res. 2016;118(9):1392–408.

    Article  PubMed  CAS  Google Scholar 

  41. Anderson FA Jr, Spencer FA. Risk factors for venous thromboembolism. Circulation. 2003;107(23 Suppl 1):I9–16.

    Article  PubMed  Google Scholar 

  42. Geerts WH, Pineo GF, Heit JA, et al. Prevention of venous thromboembolism: the seventh ACCP conference on antithrombotic and thrombolytic therapy. Chest. 2004;126(3):338S-400S.

    Article  PubMed  CAS  Google Scholar 

  43. Citla Sridhar D, Abou-Ismail MY, Ahuja SP. Central venous catheter-related thrombosis in children and adults. Thromb Res. 2020;187:103–12.

    Article  PubMed  CAS  Google Scholar 

  44. Choi G, Wolthuis EK, Bresser P, et al. Mechanical ventilation with lower tidal volumes and positive end-expiratory pressure prevents alveolar coagulation in patients without lung injury. Anesthesiology. 2006;105(4):689–95.

    Article  PubMed  Google Scholar 

  45. Havlicek EE, Goldman ZA, Faustino EVS, Ignjatovic V, Goldenberg NA, Sochet AA. Hospital-acquired venous thromboembolism during invasive mechanical ventilation in children: a single-center, retrospective cohort study. J Thromb Haemost. 2023.

    Article  PubMed  Google Scholar 

Download references


Not applicable.


This work has been supported by the Science and Technology Innovation Startup Fund of Fujian Maternal and Child Health Hospital (YCXY 23-02).

Author information

Authors and Affiliations



JZ initiated the study. CG performed data extraction and analyses. CG drafted the first version of the manuscript. JZ, FM, and SC critically reviewed the manuscript and revised it. All gave final approval and agree to be accountable for all aspects of work ensuring integrity and accuracy.

Corresponding author

Correspondence to Jinhua Zhang.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1.

 Supplementary Appendix.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guan, C., Ma, F., Chang, S. et al. Interpretable machine learning models for predicting venous thromboembolism in the intensive care unit: an analysis based on data from 207 centers. Crit Care 27, 406 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: