The customization of APACHE II for patients receiving orthotopic liver transplants

General outcome prediction models developed for use with large, multicenter databases of critically ill patients may not correctly estimate mortality if applied to a particular group of patients that was under-represented in the original database. The development of new diagnostic weights has been proposed as a method of adapting the general model – the Acute Physiology and Chronic Health Evaluation (APACHE) II in this case – to a new group of patients. Such customization must be empirically tested, because the original model cannot contain an appropriate set of predictive variables for the particular group. In this issue of Critical Care, Arabi and co-workers present the results of the validation of a modified model of the APACHE II system for patients receiving orthotopic liver transplants. The use of a highly heterogeneous database for which not all important variables were taken into account and of a sample too small to use the Hosmer–Lemeshow goodness-of-fit test appropriately makes their conclusions uncertain.


The APACHE prognostic systems
Described in 1985 [3], the APACHE II prognostic system is one of the most widely used general outcome models. Developed for use with unselected groups of critically ill adults, the system uses three types of data to provide the user with a probability of death at hospital discharge: these date are the Acute Physiology Score (APS), based on the most deranged physiological and laboratory values during the first 24 hours in the intensive care unit (ICU); the premorbid status, based on a list of chronic diseases and conditions apparent at admission to hospital; and the diagnostic category, based on a list of 29 medical and 24 surgical diagnoses.
Because the system was developed in the early 1980s, several diseases and conditions were not well represented in the original database. This fact, together with major changes in the outcome of major diseases and the need to incorporate other variables, led the authors to undertake a major update, the APACHE III prognostic system, published in 1991 [4]. This updated system, being commercial, has not had the impact of its free predecessor. With better calibration, probably reflecting more the updated database than major changes in the statistical construct of the model, it was found to be quite well calibrated for the USA [5], except in diagnostic groups for which major changes have been made to the therapeutic approach, such as acute myocardial infarction. In other settings, such as Spain, calibration problems remained, prompting a major recalibration or customization of the Apache III system [6].

Abstract
General outcome prediction models developed for use with large, multicenter databases of critically ill patients may not correctly estimate mortality if applied to a particular group of patients that was underrepresented in the original database. The development of new diagnostic weights has been proposed as a method of adapting the general model -the Acute Physiology and Chronic Health Evaluation (APACHE) II in this case -to a new group of patients. Such customization must be empirically tested, because the original model cannot contain an appropriate set of predictive variables for the particular group. In this issue of Critical Care, Arabi and co-workers present the results of the validation of a modified model of the APACHE II system for patients receiving orthotopic liver transplants. The use of a highly heterogeneous database for which not all important variables were taken into account and of a sample too small to use the Hosmer-Lemeshow goodness-of-fit test appropriately makes their conclusions uncertain.

The customization of an outcome prediction model
Customization -that is, modification of the equations that transform a score (or the directly measured variables) to a probability of mortality -has been suggested as a possible approach when there is evidence that a given model is not fully appropriate and an unbiased estimation of mortality is needed. Preliminary work [7,8] showed that slight modifications of the logistic regression equations would suffice. Later, Zhu et al., working with computer simulations [9], and groups using independent databases [10,11] showed that customization was feasible and would improve the calibration of the model but that some problems would remain, so that there would still be a need for independent validation of the customized model. This need for validation applies to the work by Angus and colleagues [2] on the development of new coefficients for the APACHE II system to adapt it to patients after liver transplantation. Those authors' approach, which was to develop a new diagnostic weighting for this category of patients, is attractive, because it is simple. However, it assumes that the APACHE II model incorporates the most important prognostic variables in the setting of liver transplantation, and this assumption needs to be justified.

Does the paper by Arabi and colleagues answer our questions?
It does not. The work done by Arabi and his co-workers was based on a highly heterogeneous database, and patients were treated in two very different institutions. Differences in the prevalence of chronic conditions and the degree of physiologic disorder as well as differences in the procedures followed during the liver transplantation (liver nutrition solutions, cold ischemia time, etc.) could have influenced the outcome for these patients. Moreover, the small number of patients in the sample analyzed makes the Hosmer-Lemeshow goodness-offit test underpowered to reveal potential differences between the predicted and the actual mortality. The better calibration of the customized model is promising, but it should be empirically tested in a larger database, constructed to reflect the case mix of liver transplantation patients.
For the moment, therefore, it remains to be shown whether the approach used -to derive a new coefficient for the APACHE II system to be applied to a specific group of patients -is potentially useful and will perform better than its predecessor.