Prediction of the development of acute kidney injury following cardiac surgery by machine learning

Background Cardiac surgery–associated acute kidney injury (CSA-AKI) is a major complication that results in increased morbidity and mortality after cardiac surgery. Most established prediction models are limited to the analysis of nonlinear relationships and fail to fully consider intraoperative variables, which represent the acute response to surgery. Therefore, this study utilized an artificial intelligence–based machine learning approach thorough perioperative data-driven learning to predict CSA-AKI. Methods A total of 671 patients undergoing cardiac surgery from August 2016 to August 2018 were enrolled. AKI following cardiac surgery was defined according to criteria from Kidney Disease: Improving Global Outcomes (KDIGO). The variables used for analysis included demographic characteristics, clinical condition, preoperative biochemistry data, preoperative medication, and intraoperative variables such as time-series hemodynamic changes. The machine learning methods used included logistic regression, support vector machine (SVM), random forest (RF), extreme gradient boosting (XGboost), and ensemble (RF + XGboost). The performance of these models was evaluated using the area under the receiver operating characteristic curve (AUC). We also utilized SHapley Additive exPlanation (SHAP) values to explain the prediction model. Results Development of CSA-AKI was noted in 163 patients (24.3%) during the first postoperative week. Regarding the efficacy of the single model that most accurately predicted the outcome, RF exhibited the greatest AUC (0.839, 95% confidence interval [CI] 0.772–0.898), whereas the AUC (0.843, 95% CI 0.778–0.899) of ensemble model (RF + XGboost) was even greater than that of the RF model alone. The top 3 most influential features in the RF importance matrix plot were intraoperative urine output, units of packed red blood cells (pRBCs) transfused during surgery, and preoperative hemoglobin level. The SHAP summary plot was used to illustrate the positive or negative effects of the top 20 features attributed to the RF. We also used the SHAP dependence plot to explain how a single feature affects the output of the RF prediction model. Conclusions In this study, machine learning methods were successfully established to predict CSA-AKI, which determines risks following cardiac surgery, enabling the optimization of postoperative treatment strategies to minimize the postoperative complications following cardiac surgeries.


Introduction
Cardiac surgery-associated acute kidney injury (CSA-AKI) is a complication following cardiac surgery and is associated with increased morbidity and mortality as well as prolonged hospital stay and higher medical costs [1,2]. One meta-analysis of global incidence and outcomes of CSA-AKI during the period 2004-2014 indicated that the incidence was approximately 22% for all stages of AKI. The pooled short-and long-term mortality rates were 10.7% and 30%, respectively, and increased with the severity of AKI [1]. Even a slight increase in serum creatinine after cardiac surgery is related to a significant increase in 30-day mortality [3].
The pathophysiology of CSA-AKI is multifactorial and remains incompletely understood. Hypoperfusion, ischemia-reperfusion injury, neurohormonal activation, inflammation, nephrotoxin exposure, and cardiopulmonary bypass (CPB)-related nonpulsatile perfusion are known to be contributing factors [4,5]. All of the aforementioned events may occur perioperatively [6]. To appropriately manage CSA-AKI, a precise prediction model for identifying high-risk patients is required to optimize the postoperative treatment strategy. Several previously published prediction models have shown reasonable ability to discriminate patients with the risk of severe AKI or AKI requiring dialysis [7][8][9][10][11][12][13][14]. However, the definitions of AKI in these models have been inconsistent, and only a handful of models have used intraoperative variables, which may critically affect the prediction of AKI. No unified definition of AKI has been reported in the literature until the development of Risk, Injury, Failure, Loss, End-Stage Kidney Disease (RIFLE) and Acute Kidney Injury Network (AKIN) criteria [15]. The Kidney Disease: Improving Global Outcomes (KDIGO) criteria for AKI staging is modified by AKIN and demonstrates more sensitive AKI detection [16]. The first model for the prediction of all AKI stages, including less severe stage 1 cases, was developed using consensus KDIGO criteria in a prospective study [17]. All risk models were developed using the logistic regression method, which requires the statistical assumption of a linear relationship between the variables and outcome. Moreover, logistic regression requires independent variables and selects small subsets of input variables based on their statistical significance for multiple regression models. But some variables that have causal effects on the output variable may not be statistically significant [18]. We might reduce the available information and miss unexpected relationships that could be utilized to improve predictive power if we excluded the variables only due to statistical assumptions.
To analyze numerous variables with nonlinearity and complex relationships that may be associated with CSA-AKI development, an alternative and effective approach is required for the development of precise prediction models. Machine learning has been applied in areas of medicine such as outcome prediction, diagnosis, medical image interpretation, and treatment [19,20]. Machine learning techniques require no assumptions regarding input variables and their relationships with the output. The advantage of completely data-driven learning without reliance on rules-based programming is that machine learning constitutes a reasonable approach. Therefore, this study applied machine learning methods to develop a model for the accurate prediction of CSA-AKI. Preoperative variables and intraoperative timeseries physiological data were used to optimize the prediction model.

Study population
We retrospectively reviewed the medical records of 671 patients who underwent coronary artery bypass (CABG), valve replacement surgery, and a combination of both treatments at Far Eastern Memorial Hospital (FEMH), New Taipei City, from August 2016 to August 2018. Institutional Review Board approval from FEMH (106159-E) was obtained prior to the commencement of this study, and informed consent was waived because the research involved no more than minimal risk to patients. The waiver does not adversely affect the rights and welfare of the participants.

Data collection and preprocessing of data
We collected data on demographic characteristics, clinical condition, preoperative biochemistry data, preoperative medication, and intraoperative time-series hemodynamic features (systolic blood pressure [SBP], diastolic blood pressure [DBP], mean arterial blood pressure [MAP], and heart rate [HR]) from electronic medical records and records on intraoperative variables at FEMH. All data except for the time-series features were collected through manual review of the medical records. Time-series data were obtained from electronic records saved at the database. Patients with an estimated glomerular filtration rate (eGFR) < 60 mL/min/1.73 m 2 for more than 3 months were defined as having chronic kidney disease (CKD). Furthermore, eGFR was calculated for all patients using the Chronic Kidney Disease Epidemiology Collaboration creatinine (CKD-EPI) equation [21]. For the time-series features (SBP, DBP, HR), the 240-min period after the beginning of the operation was used for analysis. The mean arterial pressure (MAP) was calculated using the equation: MAP = DBP + 0.01exp (4.14-40.74/HR) (SBP-DBP) [22]. We also used the average real variability (ARV) index to represent short-term, reading-to-reading, within-subject variability in blood pressure [23]; this provided a more accurate estimator of variance compared with other measures of dispersion, including standard deviation (SD), coefficient of variation (CV), and weighted SD [24]. Before ARV calculation, we excluded data for 10 min after the operation began because of excessive noise signals and data from 50 to 100 min after the operation because many patients underwent extracorporeal circulation during that time interval (Fig. 1). The other time-series data comprised 180 readings. We calculated the ARVs of SBP, DBP, and HR using the following formula: where n is the number of BP readings, and W k is the time interval between BP k and BP k−1 . A total of 179 real variabilities existed for the time-series features in each patient. We used principal component analysis (PCA) to reduce the dimensionality from 179 to 10 for the real variability (RV) data calculated using SBP, DBP, and HR.
We directly used PCA to reduce the dimensionality of the absolute value of MAP instead of RV. Moreover, we used the maximal RVs of time-series features as predictive variables. Among all the variables, the overall rate of missing data was 0.16%. The missing data were inputted as the average values or modes for the variables.
Definition of cardiac surgery-associated acute kidney injury Development of postoperative AKI was defined according to KDIGO criteria during the first 7 days after operation [16]. Postoperative AKI was defined as either at an increase of at least 50% within 7 days or 0.3 mg/dL elevation within 48 h compared with the reference serum creatinine level. The serum creatinine level measured before surgery was used as the reference value.

Machine learning
The data were randomly divided, with 70% used for training and 30% for validation. To overcome the imbalance of data in the training set, we copied the positive cases 5 times to prevent overfitting. All analyses were developed in Python (version 3.5). We attempted the following supervised machine learning methods to develop the predictive models, which are the most popular and up-to-date machine learning methods used for the problem of classification: logistic regression, simple decision tree, random forest (RF), support vector machine (SVM), eXtreme Gradient Boosting (XGboost), and ensemble (RF + XGboost). The logistic regression model accurately predicts the probability of the binary dependent variable using maximum likelihood estimation to determine the regression coefficient. Tree-based learning algorithms include the simple decision tree, RF, and XGboost. A decision tree method is a tree-like model of Fig. 1 Time period obtained during operation for ARV calculation. We obtained data for the 240 min after operation except the initial 10 min (due to noise signals) and the period between 50 and 100 min after operation due to extracorporeal circulation decisions that can predict the best choice mathematically. We used an optimized version of the Classification and Regression Trees algorithm to develop the simple decision tree [25]. We used the Gini index as a metric to identify the split point. The Gini index is the probability of randomly classify incorrectly in the dataset. The weakness of the simple decision tree is instability and a risk of overfitting, and thus RF and XGboost are created to improve the prediction. RF is an ensemble classifier that combines multiple decision trees through majority voting [26,27]. XGboost is an optimized distributed gradient boosting library that provides superior prediction through the conversion of a set of weak learners to strong learners. The algorithm is powerful by some innovations, such as the approximate greedy search, parallel learning, and hyperparameters [28]. We also attempted SVM, an algorithm for identifying a high-dimensional boundary that distinctly classifies data points.
To evaluate the prediction and accuracy of various machine learning models, we calculated and compared areas under the receiver operating characteristic curve (AUC). The correct interpretation of a prediction model for machine learning is a challenge. We used SHapley Additive exPlanation (SHAP) values to provide consistent and locally accurate attribution values for each feature within each prediction model [29]. This is a unified approach for explaining the outcome of any machine learning model. SHAP values evaluate the importance of the output resulting from the inclusion of feature A for all combinations of features other than A.

Results
We reviewed the medical records of 671 patients undergoing cardiac surgery from August 2016 to August 2018. The demographics and perioperative variables are listed in Table 1. We divided the patients randomly and allocated 70% of them to the training set and the remaining 30% to the test set. Among the patients, 250 (37.3%) received CABG, 347 (51.7%) received valve replacement surgery, and 74 (11%) underwent combined CABG and valve surgery. For the time-series features (SBP, DBP, MAP, HR), we calculated the ARVs of the time-series features, and the data are listed in Table 1. For all variables, the differences between the training set and the test set were nonsignificant. The clinical characteristics and perioperative variables for patients who developed CSA-AKI or did not are listed in Additional file 1.
We utilized the following machine learning methods with all the variables as input variables, including logistic regression, simple decision tree, RF, SVM, XGboost, and RF + XGboost to predict postoperative AKI, and the AUCs are presented in Fig. 2. Regarding the efficacy of the single model for outcome prediction, RF exhibited the largest AUC (0.839, 95% confidence interval [CI] 0.772-0.898). The AUC (0.843, 95% CI 0.778-0.899) for the ensemble model was larger than that for the RF model alone. The simple decision tree exhibited the smallest AUC (0.78, 95% CI 0.71-0.85). Figure 3 presents a simple decision tree model for classifying patients into with or without AKI. The Gini index in the terminal leaf exceeded 0.45 in 3 of 7 leaf nodes, which implied that the classification was inaccurate.
The importance matrix plot for the RF method is shown in Fig. 4 and reveals that the top 5 most important variables contributing to the model were intraoperative urine output, pRBC transfusion during surgery, preoperative hemoglobin (HGB), preoperative serum creatinine, and preoperative eGFR. Of the top 20 most important features, 14 were intraoperative variables and 8 were time-series variables.
To identify the features that influenced the prediction model the most, we depicted the SHAP summary plot of RF (Fig. 5) and the top 20 features of the prediction model. This plot depicts how high and low features' values were in relation to SHAP values in the training dataset. According to the prediction model, the higher the SHAP value of a feature, the more likely AKI becomes. The SHAP dependence plot (Fig. 6) can also be used to understand how a single feature affects the output of the RF prediction model. The y-axis values indicated the SHAP values of features, and the values of features for the x-axis were in the SHAP dependence plot. We could visualize how the feature's attributed importance changed as its values varied in the plot. SHAP values for specific features exceeding zero represent an increased risk of AKI development.

Discussion
In this retrospective cohort study, we developed and validated machine learning algorithms using 94 preoperative and intraoperative features to predict CSA-AKI. The RF model exhibited the best performance for singlemodel prediction, whereas the RF + XGboost model exhibited the greatest AUC among the models we tested. The RF and XGboost models are bootstrapping method applications, which can improve the predictive power when available datasets are small. Over half of the top 20 features on the importance matrix plot and the SHAP summary plot of RF were intraoperative features, which implies that the major effects of intraoperative condition on early kidney function decline following cardiac surgery. This study demonstrated the value of intraoperative data, which reflected acute physiological responses during surgery relevant to CSA-AKI prediction; the previously used prediction models emphasized preoperative conditions.        Established risk scores for AKI prediction following cardiac surgery were reviewed by Huen et al. and classified into AKI requiring dialysis and severe AKI according to a broad definition. Four clinical risk scores for AKI prediction requiring dialysis, including the Continuous Improvement in Cardiac Surgery Study score [8], the Cleveland Clinic Score [9], the Mehta score [10], and the Simplified Renal Index score [11], were developed using only preoperative variables. Another 3 clinical scores-the Multicenter Study of Perioperative Ischemia Score [12], the Acute Kidney Injury After Cardiac Surgery Score [13], and the Northern New England Cardiovascular Disease Study Group Score [14]-were developed to predict severe AKI. Two of these enrolled some intraoperative variables for the generation of postoperative AKI risk scores. Most of the studies were analyzed with the multivariable logistic regression method, with the performance regarding AUC ranging from 0.76 to 0.84. None used the currently accepted definitions for AKI. One prospective study of more than 30,000 patients in the UK used KDIGO criteria for CSA-AKI prediction [17]. This model demonstrated improved discrimination compared with the Cleveland Clinic Score and similar discrimination to the Mehta score.
The first study in the literature to use a machine learning approach for CSA-AKI at all stages of prediction reported that the optimal AUC was achieved with XGboost (0.78, 95% CI 0.75-0.80) [30]. The study demonstrated that the performances of machine learning models were significantly superior to those of traditional logistic regression models for the prediction of AKI following cardiac surgery. In addition, the study revealed that the AUCs of the previously used risk score models were often only 0.55 in their datasets, potentially due to the small numbers of predictors used and the lack of intraoperative variables. Our SHAP summary plot for RF exhibited some similar predictors known to be associated with CSA-AKI according to traditional risk score models. However, our plot revealed additional novel predictors, some of which were consistent with the importance matrix plot of gradient boosting in the previous machine learning study. Moreover, 5 of the top 20 features that contributed to the model obtained through the use of a SHAP summary plot were time-series variables, which were not analyzed by the aforementioned study but may have improved the AUC in our research. One single-center cohort study proposed a machine learning algorithm to reclassify approximately 40% of Fig. 3 Simple decision tree model illustrating the classification of patients with (class = yes) and without (class = no) acute kidney injury. Each box has the following components: selected variables for classification, Gini index, number of samples classified to the box according to the previous variable, the average number of patients for each classification with 5-cross validation, and the majority of classes at the split node. Blue and orange represent the yes class and the no class, respectively, and the color densities increase when the Gini indexes decrease. Abbreviations: pRBC, packed red blood cell; BMI, body mass index; CCS, Canadian Cardiovascular Society; LV, left ventricular; HGB, hemoglobin patients undergoing any surgery, who were considered to be at low risk of AKI by a preoperative model but were reclassified as high risk after the inclusion of intraoperative features [31]. This also proved that intraoperative features play a major role in AKI risk stratification. In their study, intraoperative time-series features were processed into the minimum, maximum, mean, shortterm, and long-term variability [32], which may lead to the loss of useful information compared with our method of PCA for time-series features to preserve as much of the variability in the original data as possible.
The reasons we used ARV and RV to represent timeseries features variability are as follows: There are three indexes that were commonly used to evaluate 24-h blood pressure (BP) variability (BPV) in the literature. The first one is the standard deviation (SD) of 24-h ambulatory BP monitoring recordings, which accounts only for the dispersion of values around the mean, not considering the ordering of BP readings [33]. Because the SD is correlated with mean BP, it can be inadequate in a multivariate prognostic model if more than two different measurements were used. The second one is the coefficient of variation (CV), which is the solution to overcome the above problem of SD [34]. Another major weakness of the 24-h SD is that its value is significantly affected by the nocturnal BP decrease. The third index is "weighted" 24-h SD (wSD), which is the average of diurnal and nocturnal SDs by weighting for their respective durations. It can minimize the effect of nocturnal dipping without losing the information on BPV [35]. However, no standardized methods exist for the accurate estimation of BPV. According to the published systemic review and meta-analysis, ARV is a more accurate estimator of 24-h BPV [24]. Despite using ARV in our study, we also used PCA of RV to reserve the most information of the timeseries features variability.
The advantage of our study is the use of SHAP values to uncover the black box of machine learning. Although several risk factors have been identified by previously used risk score models, such as preoperative HGB, preoperative renal function, age, operation time, left ventricular ejection fraction, body mass index, and hypertension [7][8][9][10][11][12][13][14], the recognition of intraoperative urine output, IV fluid infusion, blood product transfusion, and dynamic changes of hemodynamic features are important risk factors that have been neglected by traditional risk score models. Notably, some well-known risk factors were not ranked among the top 20 features in our study, such as diabetes mellitus, CPB time, and surgery type. The pathophysiology of CSA-AKI may explain why intraoperative features are so crucial to AKI prediction. Although the definite mechanism is not completely elucidated, renal hypoperfusion is known to result from low-flow, low-pressure, nonpulsatile perfusion with hemodilution; moreover, rapid temperature reduction because of CPB usage, bleeding complications, and inflammatory response play vital roles in CSA-AKI development [4]. Hemodynamic change, blood product transfusion, IV fluid supplement, and intraoperative urine output all reflect the acute response for renal hypoperfusion and the management required. The risk of AKI following cardiac surgery was determined by the preoperative health condition-related susceptibility to acute stress and large dynamic physiological responses intraoperatively, reflecting the ongoing response to surgery. Therefore, software may be developed that can identify high-risk patients who are prone to AKI for the optimization of treatment strategies after cardiac surgery. Moreover, extremely few values are missing from the dataset because most of the data were recorded by hand. Therefore, missing values would not have negatively affected the results.
This study was subject to some limitations. First, our analysis used only single-center data and included relatively few patients. The performance of the machine learning algorithm might differ for larger datasets with differently distributed patient characteristics and different institutions. As such, external validation is required to prevent overfitting. Second, the algorithm learned from the input features, and some hidden relationships may have been lost because of unknown or neglected features that were not enrolled by physicians. Third, most of the input features were achieved manually. We are working on developing a real-time automated electronic health record algorithm that can aggregate perioperative information of patients from various data sources. With these techniques, a machine learningbased predictive model may have the potential for use in clinical practice. Fourth, we used PCA to reduce the dimensionality of time-series features instead of analyzing the original data because of the small numbers of participants in this cohort study, which may have led to the loss of chronological and implied information. Deep learning methods might be used for numerous timeseries features if more patients are enrolled. Fifth, we did not use the previous risk scores for performance comparison because of the unavailability of all the variables required in the previously used risk score models. Lastly, predictive ability was impaired by the relatively small numbers of positive events resulting from data imbalance. Future prospective studies are required to evaluate the application of machine learning-based predictive models to clinical practice for the reduction of AKI risks.

Conclusions
In conclusion, we successfully applied the machine learning method to predict AKI after cardiac surgery, which can be used to determine risks after surgery. We demonstrated that the intraoperative time-series and other features are crucial for AKI prediction. Further software development is ongoing for the real-time adjustment of AKI risks following cardiac surgery, which in turn will optimize treatment to improve prognosis.