Predictive models for kidney disease: improving global outcomes (KDIGO) defined acute kidney injury in UK cardiac surgery

Introduction Acute kidney injury (AKI) risk prediction scores are an objective and transparent means to enable cohort enrichment in clinical trials or to risk stratify patients preoperatively. Existing scores are limited in that they have been designed to predict only severe, or non-consensus AKI definitions and not less severe stages of AKI, which also have prognostic significance. The aim of this study was to develop and validate novel risk scores that could identify all patients at risk of AKI. Methods Prospective routinely collected clinical data (n = 30,854) were obtained from 3 UK cardiac surgical centres (Bristol, Birmingham and Wolverhampton). AKI was defined as per the Kidney Disease: Improving Global Outcomes (KDIGO) Guidelines. The model was developed using the Bristol and Birmingham datasets, and externally validated using the Wolverhampton data. Model discrimination was estimated using the area under the ROC curve (AUC). Model calibration was assessed using the Hosmer–Lemeshow test and calibration plots. Diagnostic utility was also compared to existing scores. Results The risk prediction score for any stage AKI (AUC = 0.74 (95% confidence intervals (CI) 0.72, 0.76)) demonstrated better discrimination compared to the Euroscore and the Cleveland Clinic Score, and equivalent discrimination to the Mehta and Ng scores. The any stage AKI score demonstrated better calibration than the four comparison scores. A stage 3 AKI risk prediction score also demonstrated good discrimination (AUC = 0.78 (95% CI 0.75, 0.80)) as did the four comparison risk scores, but stage 3 AKI scores were less well calibrated. Conclusions This is the first risk score that accurately identifies patients at risk of any stage AKI. This score will be useful in the perioperative management of high risk patients as well as in clinical trial design.


Introduction
Acute kidney injury (AKI) is a common and severe complication of cardiac surgery affecting up to 30% of all patients and increasing mortality up to fourfold [1,2]. No effective treatment has been identified despite over 70 randomised controlled trials (RCTs) of proposed renoprotective interventions [3,4]. Systematic reviews have documented important limitations in these trials such as the enrolment of low or mixed AKI-risk patient cohorts, small sample sizes, and low statistical power [3,4].
These are important sources of bias that will have increased the likelihood of negative trial results. Recent recommendations on the design of trials in AKI suggest that these limitations may be countered by cohort enrichment [5], whereby the enrolment of patients with higher event rates will permit targeting of interventions to those patients populations most likely to benefit, smaller sample sizes, and increased study power. AKI risk prediction scores are an objective, transparent means of cohort enrichment but are not widely used. This is because existing scores have been developed principally to identify patients at risk of renal replacement therapy (RRT) which is rare and not less severe AKI, which is also associated with poor prognosis but occurs with greater frequency [6]. There is also uncertainty as to the generalizability of these scores. Only two published scores, The Cleveland clinic score [7], and the Mehta score [8] have been independently validated, and neither has demonstrated adequate discrimination and calibration in non-North American patient populations [9,10].
The aim of this study was to develop and externally validate two novel risk scores that could be used for cohort enrichment in clinical trials of renoprotective interventions in cardiac surgery. We used the Kidney Disease: Improving Global Outcomes (KDIGO) guidelines AKI definition [11] as this reconciles important differences between the two earlier consensus definitions; Acute Kidney Injury Network (AKIN) [12] and risk, injury, failure, loss, end stage (RIFLE) [13]. The prognostic utility of this score has only been demonstrated thus far in a single centre [14]. To support the validity and utility of the risk scores our first objective was therefore to demonstrate the utility of the KDIGO definition in a large multicentre cohort. Next, we developed two new scores based on preoperative variables; one to identify all patients at risk of AKI (KDIGO stages 1 to 3), and the second to identify only those patients at risk of severe AKI (KDIGO stage 3). Finally we compared the diagnostic utility of these new scores with four published risk scores.

Methods
This study was approved by the South West research ethics committee under reference 11/SW/0075. The requirement for written informed consent was waived. Our research objectives and methods were specified prior to execution.

Study population
This retrospective cohort study used routinely collected data from the patient analysis and tracking system (PATS), from three UK hospitals. We obtained prospectively collected data on all adult cardiac patients for the following periods: 1996 to 2010 at University Hospitals of Bristol, 2002 to 2010 at University Hospitals of Birmingham National Health Service (NHS) Foundation Trust, and 2004 to 2010 at Wolverhampton Heart and Lung Centre. Our analyses included all patients aged ≥16 years, who underwent cardiac surgery, with or without cardiopulmonary bypass (CBP), and including those who underwent surgery to the thoracic aorta; we excluded patients already on renal replacement therapy (RRT) or who had received a kidney transplants, and patients who died in theatre.
Definitions of perioperative variables used were consistent across the three sites, and are as specified by the UK National Adult Cardiac Surgery Audit (NACSA) [15]. NACSA is a component of the UK cardiac surgery national quality assurance programme, whereby a defined set of perioperative data are collected prospectively by the anaesthetist, surgeon, and ICU, high-dependency unit (HDU), and ward nurses on the PATS databases. These data are submitted to the National Institute Clinical Outcome Research (NICOR) for analysis and the generation of risk-adjusted outcome data [16]. The data undergo routine internal quality assurance prior to submission as well as external data quality assessment by NICOR. The NACSA Annual Report for 2010 indicated that these three centres are among the best units nationally for data quality [17]. PATS data were linked using name, hospital number, and date of birth, to institutional biochemistry databases that record serial serum creatinine results, to identify those patients who developed AKI. During the study serum creatinine was measured using the Jaffe method at all three centres. Wolverhampton and Birmingham used the Roche Modular system (Roche Diagnostics, Ltd, Lewes, UK). Bristol used the Olympus Diagnostics System AU640 or AU2700 (Olympus Diagnostic Systems, Southall, UK). The assays were calibrated using the manufacturers' controls with inter-institutional diagnostic accuracy monitored by a national (UK) quality assessment scheme.

Measurement of acute kidney injury
AKI was classified according to the KDIGO guidelines [11]. Stage-1 AKI was defined as an increase from baseline of ≥26 μmol/L of postoperative creatinine or an increase of 1.5 to 1.9 times the preoperative creatinine within 7 days; stage 2 was an increase of 2.0 to 2.9 times the preoperative creatinine; stage-3 AKI was an increase ≥3 times the preoperative creatinine or an increase to ≥354 μmol/L or when the patient commenced RRT. The RRT was administered for uraemia, volume overload, or biochemical abnormalities, according to institutional protocols. Urine output data were not available and therefore not used in our AKI definition. The baseline creatinine value was defined as the preoperative value obtained closest to the date of the operation. Cases where the baseline data were missing, most commonly emergency and salvage patients, were not included in the complete case analyses.

Preoperative factors
Information on prespecified factors was obtained from PATS for the three centres. The presence of angina was grouped according to the Canadian Cardiovascular Society (CCS) categories: no angina, CCS 1 (ordinary activity does not cause angina); CCS 2 (slight limitation); CCS 3 (moderate limitation); or CCS 4 (inability to carry out any physical activity without discomfort). Dyspnoea was grouped according to the New York Heart Association (NYHA) functional classification: NYHA 1 (no symptoms and no limitation in ordinary physical activity); NHYA 2 (mild symptoms and slight limitation during ordinary activity); NYHA 3 (marked limitation in activity due to symptoms, comfortable only at rest); or NYHA 4 (severe limitations, experiences symptoms even while at rest). Previous myocardial infarction (MI) was categorised as; none; 1; or ≥2. Information was obtained on previous cardiac, vascular or thoracic surgery (0 and ≥1), the presence of diabetes, peripheral vascular disease, pulmonary disease, neurological disease, hypertension (treated or blood pressure (BP) >140/90) and preoperative haemoglobin (<10.0; 10.0 to 11.9; or ≥12.0 g/dL). Glomerular filtration rate (GFR) was estimated from the Cockcroft-Gault equation using the preoperative creatinine value obtained closest to the day of surgery, and grouped as: <30.0; 30.0 to 59.9; 60.0 to 89.9; or ≥90 μmol/L. Heparin or nitrates usage was grouped according to: none; within a week; or at operation. Any critical preoperative event was considered to be cardiogenic shock; preoperative intravenouse (IV) inotropes; or preoperative ventilation or intra-aortic balloon pump (IABP). The time between catheterisation and surgery was grouped as: within 24 hours; >24 hours for this admission; or >24 hours for a previous admission. Information was obtained on triple vessel disease, left main stem disease, ejection fraction (grouped as: good ≥50%; fair 30 to 49%; or poor <30%), operative priority (elective, urgent and emergency/salvage). Finally the cardiac procedures being undertaken were classified as coronary artery bypass graft surgery (CAGB) only; valve only; or CABG and valve and other/multiple. Procedures classed as other included major aortic surgery; left ventricular aneurysmectomy; atrial myxoma surgery; pulmonary embolectomy; epicardial pacemaker placement; pericardectomy; atrial septal defect closure; procedure for congenital conditions; acquired ventricular septal defect closure; pulmonary endarterectomy; atrial fibrillation ablation; myomectomy; cardiac surgery plus carotid endarterectomy or peripheral vascular procedures; or any other cardiothoracic procedure not listed above.

Outcomes following surgery
Information on pulmonary complications (reintubation and ventilation; full tracheostomy; need for CPAP or BIPAP; or prolonged ventilation >48 hours), infectious complications (sternal wound infection; leg wound infection; chest infection; or septicaemia or other infection), length of stay in hospital (in days) and death in hospital was obtained from PATS.

Statistical analyses
The association between AKI and outcomes following surgery were modelled using logistic regression models. The data from Bristol and Birmingham were used to develop the prognostic models (development sample) and the models were validated on the data from Wolverhampton (validation sample). Any-stage AKI and stage-3 AKI were modelled using separate logistic regression models. Univariable associations were examined for all demographic and preoperative factors. Mutivariable associations were examined by entering all demographic and preoperative factors into a single model, which controlled for all these main effects, in a full model. Factors selected for the initial prognostic model were those from the multivariable full model with P <0.001. Factors selected for the more inclusive prognostic model were those from the full model with P <0.05. The area under the receiver operating characteristic (ROC) curve (AUC) was calculated for each model in the development and validation samples to quantify diagnostic utility. Model calibration was assessed using the Hosmer-Lemeshow test and calibration plots. Calibration plots of observed versus predicted values for AKI were analysed using linear regression to provide the slope and intercept where the closer the intercept to 0, and the closer the slope to 1, the better the calibration. The discrimination and calibration of the models were compared to two published AKI risk scores from US data [7,8], a recently published risk score from Australia [10] as well as a mortality risk score -the logistic Euroscore [18]. We were able to match all the variables for the Cleveland clinic score [7] except congestive heart failure and left ventricular ejection fraction <35% (data cut at <30%). For the Ng score [10] we were able to match all the variables with the exception of infective endocarditis. For the Mehta score [8] we were unable to match ethnicity in the development sample, and were unable to precisely match the field MI within 3 weeks, or the type of valve procedure (mitral versus aortic).
We conducted sensitivity analyses, first by considering the inclusion of two-way interactions in our score, and second, by using multiple imputations by chained equations [19,20] to account for missing data. The analysis assumes any systematic difference between the missing values and the observed values can be explained by differences in observed data. We used the ice command [21] in Stata to impute confounder and outcome missing data. Variables that may help explain the missing data (for example, demographic information, preoperative factors, AKI status and outcomes following surgery) were included in the imputation model. The missing values were sampled from their predictive distribution, based on the observed data. Standard regression analyses were used to fit the model of interest to each of the imputed datasets. Ten cycles of regression were carried out and 20 datasets imputed. All 20 estimated associations were combined to give one overall estimated association of interest. Standard error was calculated using the Rubin rules [22,23]. These rules took into account the variability in results between the imputed datasets, indicating the uncertainty of the missing values. All analyses were carried out in StataTM version 13.

Online calculator
We constructed a web-based calculator for the any-stage AKI risk score [24].    Stage-3 AKI was slightly more prevalent in the validation dataset (6.0%) compared to the development sample (3.6% in Bristol and 5.0% in Birmingham). The criteria by which patients were defined as having AKI are described in Table 3.

KDIGO-defined AKI severity and outcome
AKI was associated with increased hospital stay (Table 4). Patients without AKI had a median postoperative hospital stay of 6 days in Bristol (IQR 5, 8) and Wolverhampton (5, 7), and 9 days (6,13) in Birmingham (Table 4).  For patients with stage-1 AKI, median postoperative stay increased to 9 days (IQR 7, 13) in Bristol, 13 (9,19) in Birmingham and 8 (6,13) in Wolverhampton. Patients with the most severe AKI (stages 2 and 3) had the longest postoperative stay. In the complete case data (n = 20,995) stage-1 AKI was associated with higher odds of infective and pulmonary complications and an almost five-fold increase in the odds of death in hospital ( Model building for the clinical prediction score A total of 23 factors were examined for their associations with AKI. Odds ratios for unadjusted and fully adjusted models are presented in Table 6 (for models of any-stage AKI) and Table 7 (for models of stage-3 AKI). In fully adjusted models, 15 factors were strongly associated with any-stage AKI (P <0.001) and were included in the initial prognostic model. A further five factors were associated with any-stage AKI at the level of P <0.05, and were added to the more inclusive model. For stage-3 AKI, eight factors were included in the initial prognostic model (P <0.001), a further three factors were included in the more inclusive prognostic model (P <0.05).

Diagnostic utility
Any-stage AKI score The ROC AUC for the initial model for any-stage AKI in the development sample was 0.73 (95% CI 0.72, 0.74; Figure 1 and Table 8) and was very similar in the validation sample (AUC 0.74; 95% CI 0.72, 0.76). The Hosmer Lemshow test (P = 0.490, Table 9) and plot of observed versus predicted any-stage AKI (Figure 2A and Table 9) demonstrated good calibration in the development sample. The initial model was less well-calibrated in the validation sample (Hosmer Lemshow P = 0.192, and Figure 2B). The more inclusive model did not demonstrate better discrimination when compared to the initial model (Table 8), however, it did demonstrate better calibration as evidenced by the Hosmer-Lemeshow test (P = 0.406 in the validation sample) and plots of observed versus predicted any-stage AKI ( Figure 2B and Table 9). For the validation sample discrimination by the any-stage AKI score was better than the Euroscore (AUC 0.68; 95% CI 0.67, 0.70) and Cleveland clinic (AUC 0.70; 95% CI 0.69,     Table 8).
Hosmer-Lemshow tests and calibration plots (Table 9 and Figure 2B) demonstrated that the more inclusive model demonstrated better calibration than each of the four comparison scores.

Final model coefficients
The model coefficients for the initial model for any-stage AKI and the initial model for stage-3 AKI are shown in Table 10. Factors associated with any-stage AKI in the final model were older age, male sex, BMI >35 kg/m 2 , current smokers, higher dyspnoea categories, diabetes, peripheral vascular disease, hypertension, lower haemoglobin, lower estimated GFR, catheter to surgery within 24 hours, triple vessel disease, poor ejection fraction, emergency/salvage operations and more complex surgery. The only factor included in the stage-3 model that was not in the any-stage model was a critical preoperative event. Sensitivity analyses showed that adding two-way interactions at P <0.05 did not improve the diagnostic utility in the validation sample (data not shown). Further sensitivity analysis using only more recent data (after 2002) did not improve discrimination. Coefficients from imputed data for the final model were very similar to the complete case analysis (data not shown).

Discussion
This study has developed two new risk scores for the preoperative identification of cardiac surgery patients who are at increased risk of developing AKI. These scores were developed using a large cohort of patients from two UK cardiac centres and externally validated using data from a third UK centre. We compared the diagnostic utility of these scores to four previously published scores; Euroscore, Mehta Score, Cleveland Clinic score and the Ng score. Our risk prediction score for any-stage AKI has demonstrated better discrimination compared to the Euroscore and the Cleveland Clinic Score, and equivalent discrimination to the Mehta and Ng scores. The any-stage AKI score demonstrated better calibration than the four comparison scores. The stage-3 AKI risk prediction score demonstrated good discrimination, as did the four comparison risk scores, but these scores were less well-calibrated. The study has important strengths. It has demonstrated the prognostic utility of the KDIGO AKI definition in a large multicentre cohort, and has confirmed the prognostic importance of milder forms of AKI, as has been identified by earlier consensus definitions [12,13]. The two risk scores we have developed are the first to our knowledge that have been designed to predict a consensus definition of AKI. The use of consensus AKI definitions as endpoints is a key element of study design that is important for standardisation of reporting and comparative analyses of trials. Furthermore, the any-stage AKI risk score is the first externally validated AKI risk prediction score that has been designed to include patients at risk of KDIGO stage-1 AKI, which we have shown to have prognostic utility in the current study. The any-stage   AKI score is available as a web-based calculator [24] that is freely available to any researcher or clinician. It can be accessed by any smart phone, tablet or tabletop computer, and can be completed in less than 1 minute. Using a cutoff for the any-stage AKI score of 30% will select patients for interventional studies with a positive predictive value of 44% and a negative predictive value of 85% for AKI. We suggest that this score may be used to identify an enriched patient cohort for inclusion in clinical trials. We did not detect any advantage for our stage-3 AKI score in relation to existing scores. The stage-3 AKI score may have greater utility as a risk adjustment tool for quality assurance, or clinically to assist with informed consent. However it did not demonstrate clear advantages beyond existing score and we have not developed a web-based calculator for this score. The study has several limitations. First, retrospective analyses of routinely collected data have limitations with respect to data quality, specifically missing data,  misclassification and inconsistent data definitions between individuals and sites. To minimise these we used prospectively collected data from three clinical databases that use common, standardised definitions for clinical risk factors. The data had undergone both internal and external quality checking, had low levels of missing data, and the three sites contributing to the study are listed as among the top for data quality within the UK NACSA programme. Importantly, baseline creatinine values, defined in this study as the preoperative value obtained closest to the date of operation, were present in over 98% of patients, as would be expected in a cardiac surgery cohort where preoperative bloods are routinely taken in all but the very sickest patients. This  Figure 2. The closer the slope is to 1, and the closer the intercept is to 0, indicates better calibration. is important; alternate definitions of baseline change the reported frequency of stage-1 AKI [25] -a key consideration in this study. The only variable for which there was a significant proportion of missing data was baseline haemoglobin, and this was restricted to a single centre. To address the limitations posed by missing data we performed our primary analysis in the complete case data. We confirmed the robustness of the models developed in the complete case analysis in imputed data. Model coefficients do not differ substantially in the imputed data, which suggests that the missing data are unlikely to have introduced bias into our model. Second, unmeasured confounders are an important consideration in any retrospective analysis. For example, the Bristol and Birmingham databases do not routinely record patient race, and this has been found to be an important variable in some risk scores but not others [6]. This may also have affected the estimated GFR (eGFR) calculation, a key component of the score. Equations to calculate eGFR that include race, such as, for example, the modified diet in renal disease equation [26] have greater accuracy.
Third, intra and postoperative events that affect the incidence of AKI also represent confounders [10,27]. However, it was our intention to design a score that will identify patients at risk of AKI preoperatively on the basis that the most effective prevention strategies are likely to be those applied before or at the commencement of surgery rather than after injury (surgery) has occurred. Fourth, the AKI definition used did not incorporate urine output data, as defined in the KDIGO definition, as these data were not recorded. Urine output data are known to significantly alter the estimates in patients with AKI, although whether this improves the prognostic utility of the scores is unclear particularly in cardiac surgery, where perioperative urine output is closely monitored, and oliguria aggressively treated [28]. The use of creatininebased definitions of AKI has other limitations in cardiac surgery. In the validation sample in this study 48% of patients had undergone coronary angiography within the same hospital admission and 39% were undergoing urgent inpatient surgery. Many of these patients would have sustained significant renal insults prior to surgery and this may have increased the baseline serum creatinine value.   A final limitation is the applicability of our findings to non-UK populations. We used well-defined variables that are routinely collected in UK databases, however, differences in variable definitions that occur between databases and countries may limit the wider utility of the model. This was a limitation of the two North American comparison scores. The Mehta score, in particular was not composed of variables that were routinely collected in the UK data, including MI <3 weeks, and ethnicity. The Mehta score was also developed in a selected population and both American scores excluded patients undergoing surgery without CPB, but were subsequently tested in a UK population that included a significant proportion of off-pump procedures. Conversely variable definitions were largely comparable between the UK and Australian cohorts, and the Ng score demonstrated good discrimination but poor calibration. This may reflect the non-consensus definition of AKI used in the Ng score or differences in patient populations and clinical practice between the two countries. These findings highlight the problems of risks scores developed in distinct geographic populations. The UK AKI risk scores described here may suffer from similar limitations, and we conclude that their wider utility requires independent external validation.

Conclusion
This study has used a large multicentre cohort to develop and validate a risk prediction score for AKI stages 1 to 3. This is the only published score that predicts less severe AKI, as currently defined by the consensus KDIGO definition. We suggest that this new score core will have clinical utility for risk stratification and facilitate cohort enrichment for clinical trials of novel renoprotective interventions.

Key messages
AKI is a common and severe complication of cardiac surgery that contributes to morbidity, mortality and increased healthcare costs Previous trials of renoprotective interventions have been limited by the enrolment of low-or mixed-risk AKI cohorts, and no effective treatment has been identified thus far AKI risk scores are an objective and transparent way of identifying cohorts of patients at increased risk of AKI for clinical trials; however, existing scores identify only those patients who develop severe AKI requiring renal replacement therapy Models are the initial models (variables P <0.001 in models adjusting for main effects). AKI, acute kidney injury; NYHA, New York Heart Association; CABG, coronary artery bypass graft; Haemoglobin, g/dL; GFR, mL/min.
This study used data from two large cardiac surgery centres to develop a risk score that identifies all patients at risk of AKI (stages 1 to 3) with high discrimination and good calibration in an external validation dataset from a third centre The utility of this score is currently being prospectively validated as a cohort enrichment toll in several ongoing clinical trials