Performance of six severity-of-illness scores in cancer patients requiring admission to the intensive care unit: a prospective observational study

Introduction The aim of this study was to evaluate the performance of five general severity-of-illness scores (Acute Physiology and Chronic Health Evaluation II and III-J, the Simplified Acute Physiology Score II, and the Mortality Probability Models at admission and at 24 hours of intensive care unit [ICU] stay), and to validate a specific score – the ICU Cancer Mortality Model (CMM) – in cancer patients requiring admission to the ICU. Methods A prospective observational cohort study was performed in an oncological medical/surgical ICU in a Brazilian cancer centre. Data were collected over the first 24 hours of ICU stay. Discrimination was assessed by area under the receiver operating characteristic curves and calibration was done using Hosmer–Lemeshow goodness-of-fit H-tests. Results A total of 1257 consecutive patients were included over a 39-month period, and 715 (56.9%) were scheduled surgical patients. The observed hospital mortality was 28.6%. Two performance analyses were carried out: in the first analysis all patients were studied; and in the second, scheduled surgical patients were excluded in order to better compare CMM and general prognostic scores. The results of the two analyses were similar. Discrimination was good for all of the six studied models and best for Simplified Acute Physiology Score II and Acute Physiology and Chronic Health Evaluation III-J. However, calibration was uniformly insufficient (P < 0.001). General scores significantly underestimated mortality (in comparison with the observed mortality); this was in contrast to the CMM, which tended to overestimate mortality. Conclusion None of the model scores accurately predicted outcome in the present group of critically ill cancer patients. In addition, there was no advantage of CMM over the other general models.


Introduction
Advances in oncological and supportive care have improved survival rates in cancer patients to the point that many of them can now be cured or have their disease controlled. However, such advances have often been achieved through aggressive therapies and support, at high expense [1]. Some of these patients require admission to the intensive care unit (ICU) for acute concurrent illness, postoperative care, or complications of their cancer or its therapy [2]. In general hospitals, intensivists frequently consider these patients as having a poor prognosis and tend to oppose their admission to the ICU [3]. Recent studies [4,5] have indicated that this reluctance to offer ICU care to cancer patients with severe illness is unjustified, and is usually based on inequitable parameters in comparison with other severe and chronic diseases that share a similarly poor prognosis [6,7]. Hence, efforts have been made to identify parameters that are associated with worse prognosis and to improve allocation of ICU resources [4,5,[8][9][10][11].
Prognostic scores have been used to predict outcome in patients admitted to ICUs. Although none of these models should be used to predict individual outcomes, they can assist physicians in discussions of prognosis and in clinical decision making to improve allocation of resources in intensive care [12]. Stratification of patients for clinical research and assessment of quality of intensive care are other potential applications [12][13][14].
However, the performance of a prognostic score must be validated before it may be used in an ICU, where there is a specific mix of patients such as cancer patients. Few studies have addressed adequately the performance (calibration and discrimination properties) of prognostic scores in cancer patients [4,9,[15][16][17]. Some years ago a specific model with which to predict outcome among critically ill cancer patients -the Cancer Mortality Model (CMM) -was developed by Groeger and coworkers [9]. To the best of our knowledge, only one external validation for the CMM has been conducted recently [17], and few studies have compared different scores in cancer patients [4,16,17]. The present study evaluated the performance of five general prognostic models and validated the CMM in predicting outcome in a large prospective cohort of cancer patients requiring intensive care.

Patients and setting
The study was conducted from May 2000 to July 2003 at the Instituto Nacional de Câncer, a public tertiary hospital for referral of cancer patients in Rio de Janeiro, Brazil. Its ICU is a 10bed medical/surgical unit exclusively for oncological patients, with full-time medical and nurse directors, and medical, physiotherapy and nursing staff who are qualified in intensive care; facilities such as haemodynamic monitoring, microprocessorcontrolled mechanical ventilation and dialysis are available and can be offered for each bed. At least two senior intensivists and one junior intensivist are on duty 24 hours a day. In each shift (two per day), at least two postgraduate and four undergraduate nurses work regularly in the ICU. Most of the postgraduate nurses have a special diploma in oncology and/or intensive care, and take part in regular training of oncology and intensive care nurses. The nurse/patient ratio ranges from 1.2 to 1.7. Routine clinical rounds, including medical, nurse and physiotherapy staff, and meetings with oncologists, are done each day in the ICU. Approximately 600 patients are admitted each year to the ICU. The ICU uses a patient data management system, which allows automatic capture and registration of physiological data.
The decision regarding whether a patient should be admitted to the ICU is made jointly by the senior intensivist and the oncologist who is responsible for the patient's care. To be admitted to the ICU, patients should be considered to have a chance of being cured or having their cancer controlled. The ICU medical staff makes decisions regarding discharge during daily clinical rounds. Patients are always discharged to wards. End-of-life care is offered in the ICU when a patient does not recover from their acute illness despite ICU care. Occasionally, patients (with a diagnosis of cancer) may be admitted because of life-threatening illness identified during assessment of the extent of their cancer and/or consideration of the therapeutic options. This assessment is conducted as soon as is possible, and end-of-life care is started if specific treatments aimed at cancer cure or control can no longer be justified.
All consecutive patients with a definite diagnosis of cancer (pathologically proven) admitted to the ICU because of severe illness were included in the present study. In those patients with multiple hospital admissions, the most recent was considered. For patients readmitted to the ICU during the same hospital stay, only the first ICU admission was considered. Patients younger than 18 years (n = 284), those with burn injuries (n = 0), those with an ICU stay of less than 6 hours (n = 29) and those with definite diagnosis of acute coronary syndrome or in whom such a disorder could not be ruled out were excluded (n = 15). Patients who had been considered cured of their cancer for more than 5 years (n = 20) and those with noncancer disease (n = 36) were also excluded. Bone marrow transplant (BMT) patients are treated at a separate unit, even in case of life-threatening complications; therefore, BMT patients were not studied. The study was approved by the institutional review board, which waived the need for informed consent because the study did not interfere with clinical decisions regarding patient care.

Measurements
At admission and over the first 24 hours of ICU stay, various demographical, clinical and laboratory variables were assessed. In the calculation of scores, the most disturbed values were assigned for vital signs and laboratory data. For sedated patients, Glasgow Coma Scale score before sedation was used [18]. Zero points or normal values were inserted where data were missing [19]. There were no missing variables for physiological data. Among laboratory variables, normal values were inserted for albumin in 623 (49.6%), prothrombin time in 274 (21.8%) and bilirubin in 676 (53.8%) patients. No patient with jaundice on physical examination lacked serum bilirubin measurements. Severe chronic comorbidities were considered as defined in the assessment of each scoring system. Patients were classified, based on reason for ICU admission, into medical, scheduled surgical and emergency surgical groups. We also recorded the underlying malignancy (solid tumour versus haematological malignancy), disease status (newly diagnosed/controlled versus recurrence/progression; locoregional versus metastatic), treatments over the 6 months before ICU admission (chemotherapy, radiation therapy and surgery, excluding biopsies and catheter insertions) and Eastern Cooperative Oncology Group performance status [20] during the week before hospital admission. Neutropenia was defined as a neutrophil count of below 1000/mm 3 . The Sequential Organ Failure Assessment score was used to assess acute organ dysfunctions/failures [21].
The following general prognostic scores were measured: the Simplified Acute Physiology Score (SAPS) II [22], the Mortality Probability Models at admission (MPM II 0 ) and at 24 hours (MPM II 24 ) [23], and Acute Physiology and Chronic Health Evaluation (APACHE ® ; a registered trademark of Cerner Corporation, Kansas, MO, USA) versions II and III-J [19,24]. Each model was applied as described in their original reports. The mortality equations of the APACHE III-J have recently become available for use worldwide. The CMM [9] is a cancer specific, multivariable, logistic regression model that was specifically designed to predict the probability of hospital death in patients at admission to the ICU. Briefly, it comprises 16 easily evaluable clinical variables: cardiac arrest before admission, endotracheal intubation, intracranial mass effect, allogeneic BMT, cancer recurrence/progression, performance status, respiratory rate, systolic blood pressure, arterial oxygen tension/fractional inspired oxygen ratio, Glascow Coma Scale score, platelet count, prothrombin time, serum albumin, bilirubin, blood urea nitrogen, and number of hospital days before ICU admission (lead time). Hospital mortality was the main endpoint of interest.

Data management and statistical analysis
Data were entered into a computer database by a single author (MS). In order to ensure data consistency, another single author (JRR) cross-checked every variable entered, and a final recheck procedure was conducted for a 10% random sample of patients. All documented data were also evaluated for implausible and outlying values. Statistical analyses were carried out using SPSS software for Windows, version 10.0 (SPSS Inc., Chicago, IL, USA). Continuous variables are presented as mean ± standard deviation or median (25-75% interquartile range) and compared, respectively, using Student's t-test or Mann-Whitney U-test. Categorical variables were reported as absolute numbers (frequency percentages) and analyzed using χ 2 test (with Yates correction where applicable).
Validation of the prognostic scores was performed using standard tests to measure discrimination and calibration for each of the predictive models. The area under the receiver operating characteristic curve (AUROC) was used to evaluate the ability of each model to discriminate between patients who lived from those who died (discrimination) [25]. Hosmer-Lemeshow goodness-of-fit H statistic was used to evaluate the agreement between the observed and expected number of patients who did or did not die in the hospital across all of the strata of probabilities of death (calibration) [26]. A high P value (>0.05) would indicate a good fit for the model. Calibration curves were constructed by plotting predicted mortality rates stratified by 10% intervals of mortality risk (x axis) against observed mortality rates (y axis). Standardized mortality ratios (SMRs) with 95% confidence intervals were calculated for each model by dividing observed by predicted mortality rates. A two-tailed P value < 0.05 was considered statistically significant.
The performance of each individual mortality prediction system among all patients is presented in Table 3. All models exhibited excellent discriminatory power but calibration was poor. General prognostic scores underestimated the observed mortality (SMR > 1). By contrast, the CMM tended to overestimate (SMR = 0.51, 95% confidence interval 0.46-0.57).
To better compare the performances of the CMM and of the general prognostic scores, all scheduled surgical patients were excluded, and therefore 542 (43.1%) patients were included in this analysis. A total of 411 (75.8%) patients had solid tumours and 131 (24.2%) had haematological malignancies. Their mean age was 58.7 ± 16.7 years and their mean Sequential Organ Failure Assessment score was 7.6 ± 4.2 points. Out of these patients, 380 (70.1%) had acute respiratory failure. Hospital and ICU mortality rates were 58.7% (318/   Table 4. As was observed for all patients combined, among medical and emergency surgical patients SAPS II exhibited the best discriminative ability (AUROC = 0.815) and MPM II 0 the poorest (AUROC = 0.729), and all of the scores were poorly calibrated. Statistically significant differences between observed and predicted mortality rates, using goodness-of-fit H statistics, were obeserved for all scores. Significant underestimation of actual mortality by general scores and overestimation by the CMM were again observed. The impacts of the differences between actual and predicted mortality rates are demonstrated in the calibration curves (Figs 1 and 2).

Discussion
Many severity-of-illness scores have been developed and used to predict outcome in critically ill patients. During the past few years a series of studies dealing with the application of outcome prediction models in general critically ill patients demonstrated a similar pattern -good discrimination with poor calibration. This pattern has been observed in different settings and with different instruments [27]. Information regarding the usefulness of these general scores in cancer patients Table 3 Performance of each mortality prediction system for all patients  requiring ICU care is still restricted and most reports are limited by relatively small sample sizes and/or the statistical analyses used in the assessment of models' performance [28][29][30][31][32].
In order to better address these issues, we conducted the present study to evaluate simultaneously the performance of five general prognostic scores and to validate the CMM in a large prospective cohort of cancer patients requiring ICU admission. The hospital mortality (28.6%) for the group of ICU cancer patients evaluated here seems to be low at a first glance. However, two thirds of our patients were admitted for routine postoperative care following elective surgery. When these patients were excluded, the hospital mortality (58.7%) was similar to that in previous studies dealing with large cohorts of critically ill cancer patients (33-58.7%) [8][9][10][11]16,17]. Staudinger and coworkers [5] reported that ICU mortality was 47% and 1-year mortality was 77%. Mortality may vary with respect to the mix of patients (e.g. type of tumour, number of BMT patients, disease status and extent, and level of ICU support). In particular, the prognosis for cancer patients receiving MV is very poor. In a large prospective study conducted in 782 patients requiring MV, 76% died in the hospital [33]. In the present cohort about 37% of patients received MV.
Whether studying the entire population or the subgroup of nonscheduled surgical patients, all of the general models tested in the present study had comparatively similar levels of performance. As expected, they significantly underestimated the mortality rate. In general, discrimination was satisfactory (especially for the SAPS II and the APACHE III-J scores), but calibration was inadequate. Studying all patients, AUROC values were remarkably high (>0.850). The higher proportion of scheduled surgical patients (very low mortality), in contrast to patients with a severe illness (whether medical or emergency surgical), could be responsible for this finding. When those patients were excluded, AUROC values were similar to those reported in the literature [4,9,[15][16][17]. To our knowledge, there is no conventional method for comparing goodness-of-fit χ 2 tests, but it seems that this statistic was considerably lower for the SAPS II score than for the other models. This can be better appreciated in the calibration curves, which indicate significant underestimates in practically all of the strata of predicted mortality. Nevertheless, the line of observed mortality for the SAPS II score was closer to the line of equality when compared with other general scores. Assessments of both calibration and discriminatory abilities of general prognostic scores in cancer patients were reported in recent years, and yielded conflicting results [4,9,[15][16][17]. These scores usually tend to underestimate the observed mortality [9,15,16,34]. Groeger and coworkers [9] tested the MPM II 0 model in the first 805 patients included in the sample from which the CMM was developed. The MPM II 0 model exhibited both poor calibration and poor discrimination, and underestimated the mortality. Sculier and coworkers [16] reported similar findings from their evaluation of the APACHE II and the SAPS II scores in a cohort of 261 patients. Guiguet and coworkers [15], studying 98 neutropenic cancer patients, found a reasonable discrimination (AUROC = 0.78) and good calibration for SAPS II. In a retrospective study conducted in 124 patients with haematological cancer, Benoit and colleagues [4] recently reported similar results for the SAPS II (AUROC = 0.765) and the APACHE II (AUROC = 0.712) scores. However, the results of calibration analyses in the latter two studies should be interpreted with caution because of the relatively small numbers of patients included, so that differences between predicted and observed mortalities may not reach statistical significance. In an elegant study, Zhu and coworkers [14] analyzed the impact of sample size on the accuracy of MPM II models by performing computer simulations. They showed that the smaller the sample size, the better the model calibration, as demonstrated by lower values of the goodness-of-fit χ 2 statistics. In contrast, discrimination was not affected by sample size.
The limitations of general prognostic models in predicting outcome in cancer patients motivated investigators to develop a specific model. Reported in 1998, the CMM was developed in a multicentre study from a cohort of 1483 critically ill cancer patients to predict hospital mortality at admission to the ICU, and it was further validated in another 230 patients [9]. By containing variables specific to oncology (disease progression/recurrence, performance status and allogeneic BMT group), this model was expected to be a more accurate scoring system in cancer patients [5,16]. SAPS II, APACHE II, APACHE III-J and MPM II 24 models also take into account the presence of some cancer diagnostic categories, but they were not derived exclusively from cancer patients. The performance of the CMM was studied in medical and emergency surgical patients separately (i.e. excluding elective surgical patients) in order to minimize selection bias. There was no mention in the intial CMM report that elective surgical patients had been included in its development. At our ICU, CMM exhibited good discrimination and the AUROC value (0.795) is similar to values observed in both generation (0.812) and validation (0.802) groups of patients. However, CMM was poorly calibrated and, in contrast to general scores, exhibited a tendency to overestimate the observed mortality. Recently, Schellongowski and coworkers [17] compared the levels of performance of CMM, SAPS II and APACHE II in 242 ICU cancer patients [17]. In that study, the ability of SAPS II to discriminate between survivors and nonsurvivors (AUROC = 0.825) was superior to those of APACHE II (AUROC = 0.776) and CMM (AUROC = 0.698). All scores had acceptable calibration, although the statistical significance for the Hosmer-Lemeshow goodness-of-fit tests was borderline. The authors emphasized the limitations imposed by relatively small sample size on the results of calibration analyses.
The present study also has potential limitations. Ideally, a prognostic score should be employed in populations with similar characteristics to the sample of patients in which it was developed. Because we did not study BMT patients, it can be argued that our patients were less severely ill than those studied by Groeger and coworkers [9]. In that study, 11.3% and 5.8% of the sample were allogeneic and autologous BMT patients, respectively. These patients are considered to have the worst prognosis among cancer patients requiring intensive care, and prognosis is particularly poor when such patients need MV [16,35,36]. Our patients (excluding elective surgical patients) actually had a higher hospital mortality rate (58.7% versus 42%), but it was not feasible to make reliable comparisons of acute physiological disturbances (e.g. organ failures) between groups. In addition, whenever case mix adjustments are attempted, possible selection bias -resulting from different approaches to care (e.g. do-not-resuscitate orders) and from ICU admission/discharge policies -cannot be ruled out, especially in a single centre. Decisions to forgo life-sustaining therapy were demonstrated to independently predict hospital death in ICU patients [37]. Our ICU policies, including decisions to offer end-of-life care, appear similar to those reported in the literature [16,17].
Another issue that deserves mention is the impact of missing data on the performance of models; in the present study prothrombin time, and serum albumin and bilirubin were not obtained in all patients. The differences between the predicted mortality with each score and the observed mortality were considerable, but there is a possible impact of missing data in the unsatisfactory performance of the models. As stated above, the study did not interfere with clinical decisions, including request for laboratory tests. In particular, the poor performance of the CMM cannot be attributed to missing data because it significantly overestimated the mortality rate.
Finally, we should be cautious when using SMR findings to evaluate the quality of intensive care. The prognostic scores that are already available do not take into consideration multidimensional parameters (ICU organizational and economic aspects in addition to clinical variables) in evaluating ICU performance [38].
In conclusion, none of the severity-of-illness scores evaluated in the present study were accurate in predicting outcome for critically ill cancer patients. Moreover, similar to a recent report [17], we found no advantage of CMM over the general prognostic models. It must be re-emphasized that any prognostic model should not be the only parameter taken into account when predicting outcome, and neither should they be used for triage and cost containment in individual patients. After all, prognostic scores were constructed based on patients who have been effectively admitted to the ICU. Otherwise, an accurate score may be helpful in enroling patients in clinical trials and enriching discussions about prognosis in intensive care.

Key messages
None of the severity-of-illness scores evaluated in the present study were accurate in predicting outcome for critically ill cancer patients. There was no advantage of CMM over the general prognostic models. Prognostic scores should not be the only parameters taken into account when predicting outcome, and neither should they be used for triage and cost containment in individual patients.