External validation of a modified model of Acute Physiology and Chronic Health Evaluation (APACHE) II for orthotopic liver transplant patients

Introduction The purpose of the study was to validate the newly derived postoperative orthotopic liver transplantation (OLTX)-specific diagnostic weight for the Acute Physiology and Chronic Health Evaluation (APACHE) II mortality prediction system in independent databases. Methods Medical records of 174 liver transplantation patients admitted postoperatively to the adult intensive care units at King Fahad National Guard Hospital and the University of Wisconsin were reviewed, and data on age, sex, the underlying liver disease, APACHE II scores and the hospital outcome were collected. Predicted mortality was calculated using: 1) the original APACHE II diagnostic weight of postoperative other gastrointestinal surgery and 2) the newly derived OLTX-specific diagnostic category weight. Standardized mortality ratio and 95% confidence intervals were calculated. Calibration was evaluated with the Hosmer–Lemeshow goodness-of-fit C-statistic. Discrimination was tested by 2 × 2 classification matrices and by computing the areas under the receiver operating characteristic curves. Patient characteristics and outcome data were compared between the two hospitals. Results APACHE II significantly overestimated mortality when the original diagnostic weight was used, but provided a closer estimate of mortality with the OTLX-specific diagnostic weight. The C-statistic analysis showed better calibration for the new approach; discrimination was also improved. The performances of the prediction systems were similar in the two hospitals. The new model provided more accurate estimates of hospital mortality in each hospital. Discussion APACHE II provided an accurate estimate of mortality in liver transplant patients when the OLTX-specific diagnostic weight was used. With the new model, APACHE II can be used as a valid mortality prediction system in this group of patients.


Introduction
With the increasing worldwide availability of liver transplantation, a standardized assessment of severity of illness is needed to evaluate patient outcome objectively over time and between different institutions. Cirrhosis-specific scoring systems, such as the Child-Pugh classification and Show's risk score, have been shown to be good predictors of outcome of cirrhotic patients [1]. However, when used as predictors of outcome for liver transplantation patients the results are inconsistent [2][3][4]. This is partly explained by the fact that the preoperative condition is only one factor in a series of complex interactions that include intra-operative and postoperative factors. Systems for predicting the severity of illness and mortality, such as the Acute Physiology and Chronic Health Evaluation (APACHE) II system, are attractive options for this group because they rely on data collected soon after admission to the intensive care unit (ICU), which is likely to reflect preoperative, intra-operative and postoperative contributions. The APACHE II system was described by Knaus et al. in 1985 to predict hospital mortality in ICU patients [5]. The multiple logistic regression equations were based on data collected on 5050 medical and surgical patients admitted to the ICU in 13 tertiary medical centers in the USA. This outcome prediction system has been used to evaluate and compare the performance of ICUs in different hospitals and countries. In addition to general ICU patients, APACHE II has also been studied in specific groups of patients such as those with trauma [6], sepsis [7], and cirrhosis [8].
The APACHE II prediction equation incorporates three variables: an APACHE II score, the diagnostic category of the patient, and whether the surgery was emergency or elective. The APACHE II score consists of the Acute Physiology Score, which is calculated from 14 physiologic variables that are scored from 0 to 4 and depend upon the degree of deviation from normal. Points for age and for chronic illness are also assigned. There are 50 different diagnostic categories, each with a different weight used in calculating the predicted mortality. There is no specific diagnostic category weight for liver transplantation, because there were no liver transplantation patients in the developmental database for this system. Thus, when this system is used for postoperative liver transplantation patients, the diagnostic category weight 'postoperative other gastrointestinal surgery' is used. This approach has been shown to overestimate mortality significantly [9]. Angus et al. recently derived a new diagnostic category weight based on their population of liver transplantation patients [9]. The purpose of the study was to validate the newly derived postoperative orthotopic liver transplantation (OLTX)-specific diagnostic weight for APACHE II in independent databases.

Methods
King Fahad National Guard Hospital (KFNGH) is a 550-bed tertiary care center. The 12-bed medical-surgical ICU has 600 admissions per year. The liver transplantation program is the main program in the Kingdom of Saudi Arabia. The University of Wisconsin (UW) liver transplantation program is a major program in the USA. Liver transplantation patients are admitted to the Trauma and Life Support Center, which is a multidisciplinary ICU that admits 2000 patients per year. Medical records of liver transplantation patients admitted postoperatively to the adult ICU in the period April 1996 to January 2000 at KFNGH and April 1997 to January 2000 at UW were reviewed. Re-transplantations, kidney-liver and living-related transplantations were excluded. The following data were collected: age, sex, and underlying liver disease. APACHE II scores were calculated according to the original methodology by using the worst physiologic values in the first ICU day. The only exception was Glasgow Coma Score (GCS). Most of these patients were still under the influence of postoperative sedation during the first 24 hours in ICU, and the worst GCS would reflect the effect of sedation more than the true underlying mental status. We therefore used the best GCS, which we felt would be a better reflection of the patient's mental status. All patients were given chronic health points. Vital status at discharge from the hospital was registered.
Standardized mortality ratio (SMR) was calculated by dividing observed mortality by the predicted mortality. The 95% confidence intervals (CIs) for SMRs were calculated by regarding the observed mortality as a Poisson variable, then dividing its 95% CI by the predicted mortality [10]. The two approaches were compared with regard to calibration (the ability to provide a risk estimate corresponding to the observed mortality) and discrimination (the ability of the predictive system to differentiate survivors from non-survivors). The calibration of both systems was evaluated with the Hosmer-Lemeshow goodness-of-fit C-statistic [11]. We calculated the C-statistic by dividing the study population into six equal groups with increasing predicted mortality to ensure an adequate number of patients in each group. Discrimination was tested by 2 × 2 classification matrices at decision criteria of 10%, 30%, and 50%. Receiver operating characteristic (ROC) curves were constructed as a measure of assessing discrimination with 10% stepwise increments in predicted mortality. The two curves were compared by computing the areas under the ROC curves [12,13].
The patient characteristics and outcome data from the two participating institutions were compared, to evaluate the overall performance of the system between the two hospitals. Continuous variables were expressed as means ± SD. Categorical values were expressed in absolute and relative frequencies. All categorical variables were analyzed by the χ 2 test. Non-parametric variables were compared by Kruskal-Wallis test. P values of 0.05 or less were considered significant. Minitab for Windows (Release 12.1, Minitab Inc.) was used for statistical analysis.

Patient characteristics
During the study period 174 postoperative liver transplantation patients were admitted to ICU. Patients' characteristics, underlying liver disease, APACHE II scores, and predicted and observed outcomes are shown in Table 1.

Actual and predicted hospital mortality rates
The mean APACHE II score was 13.96, with an SD of 5.76. Observed mortality was 5.75%. When the original diagnostic weight was used, APACHE II significantly overestimated mortality (predicted mortality 12.96%, SMR 0.44, 95% CI 0.22-0.80). When the new diagnostic weight was used, the system provided a closer estimate of mortality (predicted mortality 8.89%, SMR 0.65, 95% CI 0.31-1.16). Fig. 1 shows actual and predicted mortality with the use of both approaches in the whole cohort classified according to APACHE II score.

Calibration
The goodness-of-fit analysis, with the Hosmer-Lemeshow Cstatistic, is shown in Table 2

Discrimination
Discrimination examined by 2 × 2 classification matrices showed an improvement with the new diagnostic category weight. This was reflected by the higher overall correct classification rate at the three examined decision criteria (see Table 3). Discrimination was also tested by ROC curves (Fig. 2): the areas under receiver characteristic curves for the two approaches were almost identical (0.740 and 0.744, respectively). Table 4 shows the characteristics of patients on the basis of their institutions. Patients from KFNGH were slightly (but significantly) younger than patients at UW. Hepatitis C virus was more common, and alcohol-related liver disease was less common, as an underlying disease in patients in KFNGH than in those at UW. APACHE II scores, and correspondingly predicted mortalities, were higher in KFNGH patients. Despite these differences, the performances of the prediction systems (the old and the new models) were quite similar in the two hospitals as reflected by SMRs. The new approach provided more accurate estimates of hospital mortality in each hospital than the old model.

Discussion
The findings of our study can be summarized as follows: (1) APACHE II with its original diagnostic category weight overestimated hospital mortality in postoperative liver transplantation patients; (2) when the newly derived OLTX-specific Available online http://ccforum.com/content/6/3/245 diagnostic category weight was applied, mortality prediction, discrimination, and calibration of APACHE II improved; (3) despite differences in the patient populations, the performance of the old and new models, as reflected by SMRs, was similar in the two institutions.
The literature evaluating APACHE II in postoperative liver transplantation patients is limited. Bein et al. [14] reviewed the use of scoring systems in 123 liver transplantation patients. In their study, APACHE II scores were reported; however, no calculation of the predicted mortality was performed. The study showed that APACHE II scores had good discrimination as reflected by the areas under the curves of the ROC curves. A second study by Sawyer et al. [15] found that mortality correlated with the APACHE II score. However, the predicted mortality was again not calculated.
Angus et al. [9] recently calculated the predicted mortality for postoperative liver transplantation patients and found that APACHE II system overestimated mortality when the original  Actual mortality (triangles), mortality predicted with the original model (diamonds) and mortality predicted with the orthotopic liver transplantation-specific diagnostic category weight (circles) in the whole cohort stratified by APACHE II scores. The bars represent the numbers of patients in each subgroup.  Table 2 Lemeshow  Table 3 Classification matrix and sensitivity analysis for APACHE II in its original and new models  Original  10  8  2  76  88  80  54  10  98  45  55   30  3  7  7  157  30  96  30  96  8  92   50  0  10  2  162  0  99  0  94  7  93   New  10  7  3  42  122  70  74  14  98  26  74   30  2  8  3  161  20  98  40  95  6  94   50  0  10  2  162  0  99  0  94  7  93 OCCR, overall correct classification rate; OMCR, overall misclassification rate; NPV, negative predictive value; PDV, positive predictive value; PD, predicted to die; PS, predicted to survive. equation was used (SMR 0.73, 95% CI 0.58-0.99). This is consistent with our findings. The inaccuracy of APACHE II with its original equation probably arises from several factors. The developmental database of APACHE II did not have liver transplantation patients; the use of the system with the origi-nal equation for liver transplantation patients therefore essentially assumes that the weighted diagnostic category for liver transplantation would be the same as for postoperative gastrointestinal surgery. In this study we show, as shown previously by Angus et al. [9], that this assumption is not accurate because it leads to a significant overestimation of mortality.

-Hosmer goodness-of-fit C-statistic for APACHE II in its original and new models
We believe that the reason is related to the unique pathophysiology of the period after liver transplantation. Marked changes occur during the procedure, especially at the time of reperfusion [16,17]. These include a significant decrease in blood pressure, a decrease in systemic vascular resistance, an increase in cardiac output, a decrease in pH, an increase in lactate, an increase in potassium, and a prolongation of prothrombin time [16,17]. Although some of these abnormalities start to normalize during the final stages of surgery, some will persist into to the immediate postoperative period [16] and will be reflected on any severity of illness score such as APACHE II. These changes start to normalize rapidly as the graft starts to function. The multitude of the abnormalities and the speed with which they are corrected make this group of patients unique and explains the inaccuracy of APACHE II when using the diagnostic category weight of 'postoperative gastrointestinal surgery'.
On the basis of the above, it is not surprising that a model developed on a population of liver transplant patients would provide more accurate and reproducible estimates. Similar disease-specific customizations of mortality prediction systems have been performed, such as for sepsis [18].

Figure 2
The receiver characteristic curves for the original model (dashed line) and the new model (continuous line). There are several obvious advantages to the use of APACHE II as a model of severity of illness for liver transplant patients. These include the familiarity with the system and its widespread use in ICUs. ICUs that use APACHE II as their database severity of illness scoring system will find it easy to apply the system to this subgroup of patients rather than implementing a special disease-specific system exclusively for OLTX patients. In general, using a system for scoring the severity of illness is essential for monitoring transplant program performance over time and between different institutions. Such a system also can be useful for grouping patients in clinical studies.
In conclusion, APACHE II provided an accurate estimate of mortality in liver transplant patients when the OLTX-specific diagnostic category weight was used.