Machine learning to predict hemorrhage and thrombosis during extracorporeal membrane oxygenation

© The Author(s) 2020. Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creat iveco mmons .org/licen ses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creat iveco mmons .org/publi cdoma in/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. Hemorrhage and thrombosis are major causes of morbidity and mortality during extracorporeal membrane oxygenation (ECMO). Even in a controlled setting, bleeding occurs frequently—almost half (46%) of the patients randomized to ECMO in the EOLIA trial had hemorrhage requiring transfusion [1]. The pathophysiology of these complications during ECMO is complex, dynamic and not fully understood [2]. This may explain why standard approaches to monitor coagulation are imperfect and studies that employ traditional biostatistical methods do not consistently identify common risk factors. We applied machine learning to an ECMO dataset to predict hemorrhage and thrombosis. Our hypothesis was that machine learning would accurately predict these events and identify novel factors not anticipated clinically or identified by traditional biostatistical methods. We used a preexisting, manually extracted, adult ECMO dataset established to study anticoagulation practices and ECMO complications [3]. The dataset was first cleaned. Data were condensed to one row per patient. The mean and range were used to create new variables from continuous variables. Categorical variables were encoded as binary variables using one-hot encoding. Missingness was handled by first dropping variables’ missing values for all patients. Some missing data were recovered by reviewing the electronic health record. Seven variables were dropped to limit the potential of reverse causation artificially enhancing outcome prediction. Remaining variables still missing values (thromboelastography, anti-factor Xa levels) were dropped. Hemorrhage was defined as bleeding during ECMO requiring a transfusion and/or intervention, thrombosis as deep vein thrombosis, pulmonary embolism, ischemic stroke during or following ECMO, or ECMO circuitry change. The study cohort included 44 consecutive patients supported with ECMO. The average age was 42 years; 66% were men. The most common indication for ECMO was acute respiratory distress syndrome (59%), and 66% were supported with veno-venous ECMO. There were a total of 19 hemorrhage events, most commonly cannulation site bleeding (42%), and 16 thrombotic events, most commonly deep vein thrombosis (81%). We compared chi-square to five supervised classification and regression machine learning models: random forest, recursive feature elimination, decision trees, k-nearest neighbors and logistic regression. Leave-oneout cross-validation maximized the training cohort size, which allowed each patient to be used to train and test the models to minimize sample bias [4]. The models to predict hemorrhage performed better (accuracy of 58–80%) than the models for thrombosis (40–64%) (Fig. 1). An ablation analysis ranked variables by importance to the model’s performance [5]. The rank lists for the random forest model differed from that of the chi-square model (Table 1). As expected, anticoagulation monitoring assays were most important in the chi-square model and the rank lists were identical for both outcomes. For the random forest model, the variables were more varied and included ECMO indications, cannulation strategies Open Access

Hemorrhage and thrombosis are major causes of morbidity and mortality during extracorporeal membrane oxygenation (ECMO). Even in a controlled setting, bleeding occurs frequently-almost half (46%) of the patients randomized to ECMO in the EOLIA trial had hemorrhage requiring transfusion [1]. The pathophysiology of these complications during ECMO is complex, dynamic and not fully understood [2]. This may explain why standard approaches to monitor coagulation are imperfect and studies that employ traditional biostatistical methods do not consistently identify common risk factors. We applied machine learning to an ECMO dataset to predict hemorrhage and thrombosis. Our hypothesis was that machine learning would accurately predict these events and identify novel factors not anticipated clinically or identified by traditional biostatistical methods.
We used a preexisting, manually extracted, adult ECMO dataset established to study anticoagulation practices and ECMO complications [3]. The dataset was first cleaned. Data were condensed to one row per patient. The mean and range were used to create new variables from continuous variables. Categorical variables were encoded as binary variables using one-hot encoding. Missingness was handled by first dropping variables' missing values for all patients. Some missing data were recovered by reviewing the electronic health record. Seven variables were dropped to limit the potential of reverse causation artificially enhancing outcome prediction. Remaining variables still missing values (thromboelastography, anti-factor Xa levels) were dropped. Hemorrhage was defined as bleeding during ECMO requiring a transfusion and/or intervention, thrombosis as deep vein thrombosis, pulmonary embolism, ischemic stroke during or following ECMO, or ECMO circuitry change.
The study cohort included 44 consecutive patients supported with ECMO. The average age was 42 years; 66% were men. The most common indication for ECMO was acute respiratory distress syndrome (59%), and 66% were supported with veno-venous ECMO. There were a total of 19 hemorrhage events, most commonly cannulation site bleeding (42%), and 16 thrombotic events, most commonly deep vein thrombosis (81%).
We compared chi-square to five supervised classification and regression machine learning models: random forest, recursive feature elimination, decision trees, k-nearest neighbors and logistic regression. Leave-oneout cross-validation maximized the training cohort size, which allowed each patient to be used to train and test the models to minimize sample bias [4]. The models to predict hemorrhage performed better (accuracy of 58-80%) than the models for thrombosis (40-64%) (Fig. 1).
An ablation analysis ranked variables by importance to the model's performance [5]. The rank lists for the random forest model differed from that of the chi-square model ( Table 1). As expected, anticoagulation monitoring assays were most important in the chi-square model and the rank lists were identical for both outcomes. For the random forest model, the variables were more varied and included ECMO indications, cannulation strategies Open Access *Correspondence: corey_ventetuolo@brown.edu 6 Department of Health Services, Policy and Practice, Brown School of Public Health, Providence, RI, USA Full list of author information is available at the end of the article and duration. Rank lists for the random forest model differed between the two outcomes and could not be anticipated based on clinical intuition alone (e.g., race, body mass index, indication). These observations demonstrate an advantage of machine learning in its capacity to measure the correlations between combinations of variables and the outcome rather than correlation between the variable and outcome alone. This is the first time machine learning has been applied to predict ECMO complications. The decision tree model predicted hemorrhage with promising accuracy despite the small sample size. A larger dataset would allow the use of deep learning models to potentially improve performance and validate our current models. Similar analyses using traditional biostatistical methods are infeasible. Machine learning provides an unbiased, robust and automated approach to handle and process the volume and variety of data generated by the provision of ECMO in order to elucidate factors that contribute to ECMO complications.