External validation of a modified model of Acute Physiology and Chronic Health Evaluation (APACHE) II for orthotopic liver transplant patients
© BioMed Central Ltd 2002
Received: 22 October 2001
Accepted: 12 March 2002
Published: 8 April 2002
The purpose of the study was to validate the newly derived postoperative orthotopic liver transplantation (OLTX)-specific diagnostic weight for the Acute Physiology and Chronic Health Evaluation (APACHE) II mortality prediction system in independent databases.
Medical records of 174 liver transplantation patients admitted postoperatively to the adult intensive care units at King Fahad National Guard Hospital and the University of Wisconsin were reviewed, and data on age, sex, the underlying liver disease, APACHE II scores and the hospital outcome were collected. Predicted mortality was calculated using: 1) the original APACHE II diagnostic weight of postoperative other gastrointestinal surgery and 2) the newly derived OLTX-specific diagnostic category weight. Standardized mortality ratio and 95% confidence intervals were calculated. Calibration was evaluated with the Hosmer–Lemeshow goodness-of-fit C-statistic. Discrimination was tested by 2 × 2 classification matrices and by computing the areas under the receiver operating characteristic curves. Patient characteristics and outcome data were compared between the two hospitals.
APACHE II significantly overestimated mortality when the original diagnostic weight was used, but provided a closer estimate of mortality with the OTLX-specific diagnostic weight. The C-statistic analysis showed better calibration for the new approach; discrimination was also improved. The performances of the prediction systems were similar in the two hospitals. The new model provided more accurate estimates of hospital mortality in each hospital.
APACHE II provided an accurate estimate of mortality in liver transplant patients when the OLTX-specific diagnostic weight was used. With the new model, APACHE II can be used as a valid mortality prediction system in this group of patients.
KeywordsAPACHE II liver transplantation mortality scoring systems
With the increasing worldwide availability of liver transplantation, a standardized assessment of severity of illness is needed to evaluate patient outcome objectively over time and between different institutions. Cirrhosis-specific scoring systems, such as the Child–Pugh classification and Show's risk score, have been shown to be good predictors of outcome of cirrhotic patients . However, when used as predictors of outcome for liver transplantation patients the results are inconsistent [2,3,4]. This is partly explained by the fact that the preoperative condition is only one factor in a series of complex interactions that include intra-operative and postoperative factors. Systems for predicting the severity of illness and mortality, such as the Acute Physiology and Chronic Health Evaluation (APACHE) II system, are attractive options for this group because they rely on data collected soon after admission to the intensive care unit (ICU), which is likely to reflect preoperative, intra-operative and postoperative contributions.
The APACHE II system was described by Knaus et al. in 1985 to predict hospital mortality in ICU patients . The multiple logistic regression equations were based on data collected on 5050 medical and surgical patients admitted to the ICU in 13 tertiary medical centers in the USA. This outcome prediction system has been used to evaluate and compare the performance of ICUs in different hospitals and countries. In addition to general ICU patients, APACHE II has also been studied in specific groups of patients such as those with trauma , sepsis , and cirrhosis .
The APACHE II prediction equation incorporates three variables: an APACHE II score, the diagnostic category of the patient, and whether the surgery was emergency or elective. The APACHE II score consists of the Acute Physiology Score, which is calculated from 14 physiologic variables that are scored from 0 to 4 and depend upon the degree of deviation from normal. Points for age and for chronic illness are also assigned. There are 50 different diagnostic categories, each with a different weight used in calculating the predicted mortality. There is no specific diagnostic category weight for liver transplantation, because there were no liver transplantation patients in the developmental database for this system. Thus, when this system is used for postoperative liver transplantation patients, the diagnostic category weight 'postoperative other gastrointestinal surgery' is used. This approach has been shown to overestimate mortality significantly . Angus et al. recently derived a new diagnostic category weight based on their population of liver transplantation patients . The purpose of the study was to validate the newly derived postoperative orthotopic liver transplantation (OLTX)-specific diagnostic weight for APACHE II in independent databases.
King Fahad National Guard Hospital (KFNGH) is a 550-bed tertiary care center. The 12-bed medical–surgical ICU has 600 admissions per year. The liver transplantation program is the main program in the Kingdom of Saudi Arabia. The University of Wisconsin (UW) liver transplantation program is a major program in the USA. Liver transplantation patients are admitted to the Trauma and Life Support Center, which is a multidisciplinary ICU that admits 2000 patients per year. Medical records of liver transplantation patients admitted postoperatively to the adult ICU in the period April 1996 to January 2000 at KFNGH and April 1997 to January 2000 at UW were reviewed. Re-transplantations, kidney–liver and living–related transplantations were excluded. The following data were collected: age, sex, and underlying liver disease. APACHE II scores were calculated according to the original methodology by using the worst physiologic values in the first ICU day. The only exception was Glasgow Coma Score (GCS). Most of these patients were still under the influence of postoperative sedation during the first 24 hours in ICU, and the worst GCS would reflect the effect of sedation more than the true underlying mental status. We therefore used the best GCS, which we felt would be a better reflection of the patient's mental status. All patients were given chronic health points. Vital status at discharge from the hospital was registered.
Predicted mortality was calculated with the logistic regression formula described in the original article . We used two approaches: the original APACHE II diagnostic category weight of postoperative gastrointestinal surgery (-0.613), and the OLTX-specific diagnostic category weight calculated by Angus et al. (-1.076) . The formulae for calculating predicted mortality (risk of death [ROD]) are as follows:
for the original approach, ln (ROD/1 - ROD) = -3.517 + (APACHE II score × 0.146) - 0.613;
for the new approach, ln (ROD/1 - ROD) = -3.517 + (APACHE II score × 0.146) - 1.076.
Standardized mortality ratio (SMR) was calculated by dividing observed mortality by the predicted mortality. The 95% confidence intervals (CIs) for SMRs were calculated by regarding the observed mortality as a Poisson variable, then dividing its 95% CI by the predicted mortality . The two approaches were compared with regard to calibration (the ability to provide a risk estimate corresponding to the observed mortality) and discrimination (the ability of the predictive system to differentiate survivors from non-survivors). The calibration of both systems was evaluated with the Hosmer–Lemeshow goodness-of-fit C-statistic . We calculated the C-statistic by dividing the study population into six equal groups with increasing predicted mortality to ensure an adequate number of patients in each group. Discrimination was tested by 2 × 2 classification matrices at decision criteria of 10%, 30%, and 50%. Receiver operating characteristic (ROC) curves were constructed as a measure of assessing discrimination with 10% stepwise increments in predicted mortality. The two curves were compared by computing the areas under the ROC curves [12,13].
The patient characteristics and outcome data from the two participating institutions were compared, to evaluate the overall performance of the system between the two hospitals. Continuous variables were expressed as means ± SD. Categorical values were expressed in absolute and relative frequencies. All categorical variables were analyzed by the χ2 test. Non-parametric variables were compared by Kruskal–Wallis test. P values of 0.05 or less were considered significant. Minitab for Windows (Release 12.1, Minitab Inc.) was used for statistical analysis.
During the study period 174 postoperative liver transplantation patients were admitted to ICU. Patients' characteristics, underlying liver disease, APACHE II scores, and predicted and observed outcomes are shown in Table 1.
Actual and predicted hospital mortality rates
The goodness-of-fit analysis, with the Hosmer–Lemeshow C-statistic, is shown in Table 2; the new system had better calibration (original model, χ2 = 11.06, P = 0.03; new model, χ2 = 5.92, P = 0.20).
Comparison between the two institutions
Table 4 shows the characteristics of patients on the basis of their institutions. Patients from KFNGH were slightly (but significantly) younger than patients at UW. Hepatitis C virus was more common, and alcohol-related liver disease was less common, as an underlying disease in patients in KFNGH than in those at UW. APACHE II scores, and correspondingly predicted mortalities, were higher in KFNGH patients. Despite these differences, the performances of the prediction systems (the old and the new models) were quite similar in the two hospitals as reflected by SMRs. The new approach provided more accurate estimates of hospital mortality in each hospital than the old model.
The findings of our study can be summarized as follows: (1)APACHE II with its original diagnostic category weight overestimated hospital mortality in postoperative liver transplantation patients; (2) when the newly derived OLTX-specific diagnostic category weight was applied, mortality prediction, discrimination, and calibration of APACHE II improved; (3)despite differences in the patient populations, the performance of the old and new models, as reflected by SMRs, was similar in the two institutions.
The literature evaluating APACHE II in postoperative liver transplantation patients is limited. Bein et al.  reviewed the use of scoring systems in 123 liver transplantation patients. In their study, APACHE II scores were reported; however, no calculation of the predicted mortality was performed. The study showed that APACHE II scores had good discrimination as reflected by the areas under the curves of the ROC curves. A second study by Sawyer et al.  found that mortality correlated with the APACHE II score. However, the predicted mortality was again not calculated.
Angus et al.  recently calculated the predicted mortality for postoperative liver transplantation patients and found that APACHE II system overestimated mortality when the original equation was used (SMR 0.73, 95% CI 0.58–0.99). This is consistent with our findings. The inaccuracy of APACHE II with its original equation probably arises from several factors. The developmental database of APACHE II did not have liver transplantation patients; the use of the system with the original equation for liver transplantation patients therefore essentially assumes that the weighted diagnostic category for liver transplantation would be the same as for postoperative gastrointestinal surgery. In this study we show, as shown previously by Angus et al. , that this assumption is not accurate because it leads to a significant overestimation of mortality.
We believe that the reason is related to the unique patho-physiology of the period after liver transplantation. Marked changes occur during the procedure, especially at the time of reperfusion [16,17]. These include a significant decrease in blood pressure, a decrease in systemic vascular resistance, an increase in cardiac output, a decrease in pH, an increase in lactate, an increase in potassium, and a prolongation of prothrombin time [16,17]. Although some of these abnormalities start to normalize during the final stages of surgery, some will persist into to the immediate postoperative period  and will be reflected on any severity of illness score such as APACHE II. These changes start to normalize rapidly as the graft starts to function. The multitude of the abnormalities and the speed with which they are corrected make this group of patients unique and explains the inaccuracy of APACHE II when using the diagnostic category weight of 'postoperative gastrointestinal surgery'.
On the basis of the above, it is not surprising that a model developed on a population of liver transplant patients would provide more accurate and reproducible estimates. Similar disease-specific customizations of mortality prediction systems have been performed, such as for sepsis .
There are several obvious advantages to the use of APACHE II as a model of severity of illness for liver transplant patients. These include the familiarity with the system and its widespread use in ICUs. ICUs that use APACHE II as their database severity of illness scoring system will find it easy to apply the system to this subgroup of patients rather than implementing a special disease-specific system exclusively for OLTX patients. In general, using a system for scoring the severity of illness is essential for monitoring transplant program performance over time and between different institutions. Such a system also can be useful for grouping patients in clinical studies.
Characteristics of patients
Number of patients
50 ± 12
49.82 ± 2.09
49.60 ± 9.25
APACHE II score
13.81 ± 5.26
13.46 ± 5.06
19.50 ± 5.56
ROD original model
12.96 ± 10.25
12.27 ± 9.60
24.28 ± 4.12
ROD new model (%)
8.89 ± 8.08
8.37 ± 7.60
17.37 ± 11.14
SMR original model; 95% CI
SMR new model; 95% CI
Lemeshow–Hosmer goodness-of-fit C-statistic for APACHE II in its original and new models
Predicted by APACHE II original model
Predicted by APACHE II new model
11.06 (df = 4)
5.92 (df = 4)
Classification matrix and sensitivity analysis for APACHE II in its original and new models
Comparison between the two participating transplant centers
Number of patients
46.25 ± 13.97
51.82 ± 10.12
APACHE II score
21.09 ± 5.01
14.37 ± 3.59
ROD, original model
20.55 ± 12.94
8.65 ± 4.38
ROD, new model
14.54 ± 10.73
5.68 ± 3.01
SMR, original model; 95% CI
SMR, new model; 95% CI
· APACHE II with its original diagnostic category weight overestimated hospital mortality in postoperative liver transplantation patients.
· When the newly derived OLTX specific diagnostic category weight was applied, mortality prediction, discrimination and calibration of APACHE II improved.
· Despite differences in the patient population, the performance of the old and new models was similar in the two institutions as reflected by SMRs.
Acute Physiology and Chronic Health Evaluation
Glasgow Coma Score
intensive care unit
orthotopic liver transplantation
receiver operating characteristic
standardized mortality ratio.
- Infante-Rivard C, Esnaola S, Villeneuve JP: Clinical and statistical validity of conventional prognostic factors in predicting short-term survival among cirrhotics. Hepatology 1987, 17: 660-664.View ArticleGoogle Scholar
- Deschenes M, Villeneuve JP, Dagenais M, Fenyves D, Lapointe R, Pomier-Layrargues G, Roy A, Willems B, Marleau D: Lack of relationship between preoperative measures of severity of cirrhosis and short-term survival after liver transplantation. Liver Transpl Surg 1997, 3: 532-537.View ArticlePubMedGoogle Scholar
- Maggi U, Rossi G, Colledan M, Fassati LR, Gridelli B, Reggiani P, Basadonna G, Colombo A, Doglia M, Ferla G: Child–Pugh score and liver transplantation. Transplant Proc 1993, 25: 1769-1770.PubMedGoogle Scholar
- Show BW, Wood P, Stratta RJ, Pillen TJ, Langnas AN: Stratifying the causes of death in liver transplant recipients. Arch Surg 1989, 124: 895-900.View ArticleGoogle Scholar
- Knaus WA, Draper EA, Wagner DP, Zimmerman JE: APACHE II: a severity of disease classification system. Crit Care Med 1985, 13: 818-829.View ArticlePubMedGoogle Scholar
- Wong DT, Barrow PM, Gomez M, McGuire GP: A comparison of the Acute Physiology and Chronic Health Evaluation (APACHE) II score and the Trauma-Injury Severity Score (TRISS) for outcome assessment in intensive care unit trauma patients. Crit Care Med 1996, 24: 1642-1648. 10.1097/00003246-199610000-00007View ArticlePubMedGoogle Scholar
- Bohnen JM, Mustard RA, Oxholm SE, Schouten BD: APACHE II score and abdominal sepsis. Arch Surg 1988, 123: 225-229.View ArticlePubMedGoogle Scholar
- Zauner CA, Apsner RC, Kranz A, Kramer L, Madl C, Schneider B, Schneeweiss B, Ratheiser K, Stockenhuber F, Lenz K: Outcome prediction for patients with cirrhosis of the liver in a medical ICU: a comparison of APACHE scores and liver-specific scoring systems. Intens Care Med 1996, 22: 559-563. 10.1007/s001340050130View ArticleGoogle Scholar
- Angus DC, Clermont G, Kramer DJ, Linde-Zwirble WT, Pinsky MR: Short-term and long-term outcome prediction with the Acute Physiology and Chronic Health Evaluation II system after ortho-topic liver transplantation. Crit Care Med 2000, 28: 150-156. 10.1097/00003246-200001000-00025View ArticlePubMedGoogle Scholar
- Goldhill DR, Sumner A: Outcome of intensive care patients in a group of British intensive care units. Crit Care Med 1998, 26: 1337-1345. 10.1097/00003246-199808000-00017View ArticlePubMedGoogle Scholar
- Lemeshow S, Hosmer DW: A review of goodness of fit statistics for use in the development of logistic regression models. Am J Epidemiol 1982, 115: 92-106.PubMedGoogle Scholar
- Metz CE: Basic principles of ROC analysis. Semin Nucl Med 1978, 8: 283-298.View ArticlePubMedGoogle Scholar
- Hanley JA, McNeil BJ: The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 1982, 143: 29-36.View ArticlePubMedGoogle Scholar
- Bein T, Frohlich D, Pomsl J, Forst H, Pratschke E: The predictive value of four scoring systems in liver transplant recipients. Intens Care Med 1995, 21: 32-37.View ArticleGoogle Scholar
- Sawyer RG, Durbin CG, Rosenlof LK, Pruett TL: Comparison of APACHE II scoring in liver and kidney transplant recipients versus trauma and general surgical patients in a single intensive care unit. Clin Transplant 1995, 9: 401-405.PubMedGoogle Scholar
- Kalpokas M, Bookallil M, Sheil AG, Rickard KA: Physiological changes during liver transplantation. Anaesth Intens Care 1989, 17: 24-30.Google Scholar
- Rettke SR, Janossy TA, Chantigian RC, Burritt MF, Van Dyke RA, Harper JV, Ilstrup DM, Taswell HF, Wiesner RH, Krom RA: Hemo-dynamic and metabolic changes in hepatic transplantation. Mayo Clin Proc 1989, 64: 232-240.View ArticlePubMedGoogle Scholar
- LeGall JR, Lemeshow S, Leleug , Klar J, Huillard J, Rui M, Teres D, Artigas A: Customized probability models for early severe sepsis in adult intensive care patients. JAMA 1995, 273: 644-650.View ArticleGoogle Scholar